Closed SilasK closed 1 year ago
After some trial and error, it seems that putting mouse gut metagenome
as metagenome
is the right thing. Otherwise, I get the error ERROR: metagenomes associated with each genome need to belong to ENA's approved metagenomes list.
which brings me back to my initial erro.
If I wollow the url indicated in the warning https://www.ebi.ac.uk/ena/taxonomy/rest/scientific-name/
I got to a page with { "error": "Search value must be provided." }
My test table looks like this now:
genome_name | run_accessions | assembly_software | binning_software | binning_parameters | stats_generation_software | completeness | contamination | rRNA_presence | NCBI_lineage | metagenome | co-assembly | genome_coverage | genome_path | broad_environment | local_environment | environmental_medium |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
MGG00002 | ERR1989816 | metaSpades_v3.13 | metagenome_atlas_v2.3 | default | checkM_v1.1 | 84.91 | 1.32 | FALSE | dBacteria;pBacteroidetes;cBacteroidia;oBacteroidales;fPorphyromonadaceae;g;s__ | mouse gut metagenome | FALSE | 100 | genomes/MGG00002.fasta.gz | Host-associated | Mouse digestive system | Cecum |
MGG00003 | ERR1989816 | metaSpades_v3.13 | metagenome_atlas_v2.3 | default | checkM_v1.1 | 95.7 | 1.08 | FALSE | dBacteria;pProteobacteria;cAlphaproteobacteria;o;f;g;s__ | mouse gut metagenome | FALSE | 100 | genomes/MGG00003.fasta.gz | Host-associated | Mouse digestive system | Cecum |
MGG00005 | ERR1989816 | metaSpades_v3.13 | metagenome_atlas_v2.3 | default | checkM_v1.1 | 95.91 | 0 | FALSE | dBacteria;pBacteroidetes;cBacteroidia;oBacteroidales;fRikenellaceae;gAlistipes;s__ | mouse gut metagenome | FALSE | 100 | genomes/MGG00005.fasta.gz | Host-associated | Mouse digestive system | Cecum |
MGG00007 | ERR1989816 | metaSpades_v3.13 | metagenome_atlas_v2.3 | default | checkM_v1.1 | 95.45 | 1.72 | FALSE | dBacteria;pFirmicutes;cClostridia;oEubacteriales;fLachnospiraceae;g;s__ | mouse gut metagenome | FALSE | 100 | genomes/MGG00007.fasta.gz | Host-associated | Mouse digestive system | Cecum |
MGG00008 | ERR1989816 | metaSpades_v3.13 | metagenome_atlas_v2.3 | default | checkM_v1.1 | 98.92 | 0 | FALSE | dBacteria;pBacteroidetes;cBacteroidia;oBacteroidales;fOdoribacteraceae;gOdoribacter;s__ | mouse gut metagenome | FALSE | 100 | genomes/MGG00008.fasta.gz | Host-associated | Mouse digestive system | Cecum |
Hi Silas,
As you say, unfortunately the Failed to validate sample xml, error: Invalid decimal value: expected at least one digit
error is quite uninformative as it doesn't point to a specific field. This error is returned by ENA at registration time, therefore it's difficult to parse it and provide any deeper insight. I took a look at your tsv and noticed that the error is probably generated in the "contamination" column, where some contamination values are set to 0
. I believe ENA would expect it to be 0.0
- I am going to add a check in the script between today and tomorrow.
I would therefore suggest to restore the original mouse gut metagenome
value in the metagenome column, as it is the most accurate for your data.
About error 400 - this is actually a logging error and I will take care of removing it. Thanks for pointing it out.
Finally, I will reply about coverage in the other issue.
I looked at your logs again and noticed you are probably using the --xml
and -manifests
options together. As a heads up, unless your xml needs to be rewritten, the two options can be used separately. This definitely improves performances for high amounts of genomes.
So I run it first with xml or with maifest or with nither of those?
--xml
generates the first xml files, it has to be used for xmls to be generated or updated. Manifest generation is the following step, and needs xmls to exist to work. Therefore, you can use these options as you prefer, as long as the xml step is run at some point before manifest generation. Mine was just a suggestion in terms of performances: once your xmls are generated, you can omit the --xmls
option and just go with --manifests
one.
I put 0.00
in contamination and also 10.99
in genome coverage but I get the same error.
my command:
python ~/CMMG/genome_upload.py -u PRJNA646353 \
--genome_info genome_uplod_table_test.txt \
--mags --xmls --manifests --out ~/s/CMMG/ \
--centre_name 'University of Geneva' \
--webin Webin-XXX --password XXXXX \
--force
Hi Silas, thanks for providing the tsv file. It allowed me to identify the issue within the parsing of the taxonomy field. May I ask you - is the taxonomy you provided in NCBI or GTDB format?
It's GTDB but then converted to NCBI using the majority vote script from GTDB-tk. I would prefer putting in the GTDB taxonomy but you require the NCBI, isn'tit? Are the empty genera/pecies a problem?
Hi Silas, Then all it's good, I was just double checking! But yes, ENA requires NCBI annotations for the submission. I have deployed the new version of the code, which should take care of the issues mentioned above. Let me know if any other problem comes up! Hopefully, this is not the case.
Seems that i can run the script now
I tried to upload some genomes and encounter the error:
The error message doesn't tell me where the error lies. I thought it was due to the
metagenome
coloumn which I had:mouse gut metagenome
I replaced it with the number
410661
, but then I got the following error:ERROR: metagenomes associated with each genome need to belong to ENA's approved metagenomes list.
It might also be due to the fact that I have
0
in thegenome_coverage
coloum, See #2Input table
genome_name | run_accessions | assembly_software | binning_software | binning_parameters | stats_generation_software | completeness | contamination | rRNA_presence | NCBI_lineage | metagenome | co-assembly | genome_coverage | genome_path | broad_environment | local_environment | environmental_medium -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- MGG00002 | ERR1989816 | metaSpades v3.13 | metagenome-atlas v2.3 | default | checkM v1.1 | 84.91 | 1.32 | FALSE | d__Bacteria;p__Bacteroidetes;c__Bacteroidia;o__Bacteroidales;f__Porphyromonadaceae;g__;s__ | 410661 | FALSE | 0 | genomes/MGG00002.fasta.gz | Host-associated | Mouse digestive system | Cecum MGG00003 | ERR1989816 | metaSpades v3.13 | metagenome-atlas v2.3 | default | checkM v1.1 | 95.7 | 1.08 | FALSE | d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__;f__;g__;s__ | 410661 | FALSE | 0 | genomes/MGG00003.fasta.gz | Host-associated | Mouse digestive system | Cecum MGG00005 | ERR1989816 | metaSpades v3.13 | metagenome-atlas v2.3 | default | checkM v1.1 | 95.91 | 0 | FALSE | d__Bacteria;p__Bacteroidetes;c__Bacteroidia;o__Bacteroidales;f__Rikenellaceae;g__Alistipes;s__ | 410661 | FALSE | 0 | genomes/MGG00005.fasta.gz | Host-associated | Mouse digestive system | Cecum MGG00007 | ERR1989816 | metaSpades v3.13 | metagenome-atlas v2.3 | default | checkM v1.1 | 95.45 | 1.72 | FALSE | d__Bacteria;p__Firmicutes;c__Clostridia;o__Eubacteriales;f__Lachnospiraceae;g__;s__ | 410661 | FALSE | 0 | genomes/MGG00007.fasta.gz | Host-associated | Mouse digestive system | Cecum MGG00008 | ERR1989816 | metaSpades v3.13 | metagenome-atlas v2.3 | default | checkM v1.1 | 98.92 | 0 | FALSE | d__Bacteria;p__Bacteroidetes;c__Bacteroidia;o__Bacteroidales;f__Odoribacteraceae;g__Odoribacter;s__ | 410661 | FALSE | 0 | genomes/MGG00008.fasta.gz | Host-associated | Mouse digestive system | Cecum MGG00009 | ERR1989816 | metaSpades v3.13 | metagenome-atlas v2.3 | default | checkM v1.1 | 95.09 | 0 | FALSE | d__Bacteria;p__Bacteroidetes;c__Bacteroidia;o__Bacteroidales;f__Muribaculaceae;g__Muribaculum;s__Muribaculum intestinale | 410661 | FALSE | 0 | genomes/MGG00009.fasta.gz | Host-associated | Mouse digestive system | Cecum MGG00010 | ERR1989816 | metaSpades v3.13 | metagenome-atlas v2.3 | default | checkM v1.1 | 94.18 | 0.22 | FALSE | d__Bacteria;p__Firmicutes;c__Clostridia;o__Eubacteriales;f__Christensenellaceae;g__Christensenella;s__ | 410661 | FALSE | 0 | genomes/MGG00010.fasta.gz | Host-associated | Mouse digestive system | Cecum MGG00011 | ERR1989816 | metaSpades v3.13 | metagenome-atlas v2.3 | default | checkM v1.1 | 97.28 | 0.57 | FALSE | d__Bacteria;p__Bacteroidetes;c__Bacteroidia;o__Bacteroidales;f__Muribaculaceae;g__;s__ | 410661 | FALSE | 0 | genomes/MGG00011.fasta.gz | Host-associated | Mouse digestive system | Cecum MGG00012 | ERR1989816 | metaSpades v3.13 | metagenome-atlas v2.3 | default | checkM v1.1 | 96.86 | 0.38 | FALSE | d__Bacteria;p__Bacteroidetes;c__Bacteroidia;o__Bacteroidales;f__Muribaculaceae;g__;s__ | 410661 | FALSE | 0 | genomes/MGG00012.fasta.gz | Host-associated | Mouse digestive system | Cecum