Closed bradfordcondon closed 5 years ago
<FtpPath_GenBank>ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/704/625/GCA_000704625.1_ASM70462v1</FtpPath_GenBank>
<FtpPath_RefSeq>ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/704/625/GCF_000704625.1_ASM70462v1</FtpPath_RefSeq>
<FtpPath_Assembly_rpt/>
<FtpPath_Stats_rpt/>
<FtpPath_Regions_rpt/>
no stats FTP path, which is where we get the tag from. Without that we're SOL.
Interestingly genbank still has the info:
Assembly method: MaSuRCA v. 2.0.3.1
Where does it get it from?
the report IS available. its here: ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/704/625/GCF_000704625.1_ASM70462v1/GCF_000704625.1_ASM70462v1_assembly_report.txt
However... that info isnt in the XML.
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/704/625/GCF_000704625.1_ASM70462v1
soo.... we could try to guess the location of the report if we dont find it? take the refseq path, append the folder name and _assembly_report.txt ?
shoot having this same problem for locust https://www.ncbi.nlm.nih.gov/assembly/GCA_000516895.1 . this means we really have to fix it.
It only has one link:
<FtpPath_GenBank>ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/516/895/GCA_000516895.1_LocustGenomeV1</FtpPath_GenBank>
the report can be found here: ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/516/895/GCA_000516895.1_LocustGenomeV1/GCA_000516895.1_LocustGenomeV1_assembly_stats.txt
so seems like the correct thign to do is guess as suggested above.
program is a required column in analysis.
when loading (accidentally, its a bacteria) assembly 185471: