Open itiago opened 5 years ago
Hi Gherman, did you had the time to look at this error, Am I doing something wrong? Thank you for any input. best
Are you sure that the contigs in the Coverage_EnergeticaPathwaysGenes/
bins have the exact same names are those in epathCoverageResults/AC3_bulkcontigs_renamed.fas
? Try running without providing the -a
option.
Yes I am sure, they came from the same file. I've tried that, not giving the -a option the error is the same
A terça, 28/05/2019, 18:01, Gherman V. Uritskiy notifications@github.com escreveu:
Are you sure that the contigs in the Coverage_EnergeticaPathwaysGenes/ bins have the exact same names are those in epathCoverageResults/AC3_bulkcontigs_renamed.fas? Try running without providing the -a option.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/bxlab/metaWRAP/issues/184?email_source=notifications&email_token=AEAA5GPOEP6CT6MYVZBIVLLPXVQPBA5CNFSM4HPS33GKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWMY4DQ#issuecomment-496602638, or mute the thread https://github.com/notifications/unsubscribe-auth/AEAA5GNNFU6N7INF4DBC2ODPXVQPBANCNFSM4HPS33GA .
Can you provide the AC3_2MetaGenMerged_R1_val.quant.counts
file?
I dint knew which was which, so I zipped the folder.
On Tue, May 28, 2019 at 6:08 PM Gherman V. Uritskiy < notifications@github.com> wrote:
Can you provide the AC3_2MetaGenMerged_R1_val.quant.counts file?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/bxlab/metaWRAP/issues/184?email_source=notifications&email_token=AEAA5GOQZXZYZAKM2TMKTZDPXVRHNA5CNFSM4HPS33GKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWMZOGA#issuecomment-496604952, or mute the thread https://github.com/notifications/unsubscribe-auth/AEAA5GKDLIV6TUCQDAJVVEDPXVRHNANCNFSM4HPS33GA .
-- Igor Tiago Researcher Universidade de Coimbra
Laboratório Microbiologia Edificio Patronato Rua da Matemática Nº 49 3000-276 Coimbra, Portugal
I don't see an attachment...
I've sent by email, maybe it is way it didn't go
Your output should have a quant_files
folder. Do you have this? I will also need the bins folder.
complete output. thank you for the help! CoverageMetaWrap_output_unbinned.zip
If you look at the output file, it reads:
None of the contigs/scaffolds in the -a metagenomic assembly file were present in the bin files. Please make sure that the bins and total assembly have the exact same bins. One cause for this could be that you reassembled the bins, disrupring the contig naming. If you do not have the original total metagenomic assembly file, then you could not provide the -a option at all (but this is not ideal for abundance estimation).
This means 2 things:
But why does it happens when I try to do that without the -a assemble.fas? Can I send them as private?
That is a good question, but I do not know until I see them. And sure, feel free to email me. Maybe just a few problematic ones so I can see whats wrong.
The bin (protein) files you gave me have a different naming convention than the contig names in the metawrap output you gave me. For example, k121_28102_
vs k121_4_flag_1_multi_5.0000_len_622
. The fact that the metawrap output has the full contig name means that you probably gave it the -a
option. Its possible that re-running the module on the same output did not overwrite your earlier attempts. I dont think there is an error here, you just need to re-run the program. Make sure you delete the old metawrap output, and do not provide the -a
option.
Two more things. First, be careful when working with gene names in fasta format to make sure every identifier is unique. I noticed you simple truncated the contig name, but what happens when there are more than one gene of interest on one contig? I use names like k121_4_flag_1_multi_5.0000_len_622-1
and k121_4_flag_1_multi_5.0000_len_622-2
- that way i have the gene ID and full contig name. The second is that if you are estimating the gene abundance in DNA data, you will actually get more robust estimates by using the coverage of the whole contig to approximate the gene coverage. This way things like GC biases and random read drop-in and drop-out wont affect it. The core assumption here is that the CPM (counts per million) abundance of a gene is identical to that of the contig that carries it, which it true for metagenomic (DNA) data.
Gherman sorry for not having a description of the files, the files that I sent already had all that in consideration: I renamed all contigs to k121xxxx I used the references of the genes to get the contigs from where they belonged, so each file from genes are in fact the contigs from where the genes come from, and those contigs are named as the contigs k121xxx so even if there are two genes in a contigs (say rubisco small and big subunit) the file will only have one contig. My question is if there is a problem when the same contig is in different files, I consider to be negative since there is a way to know if there is contigs that are shared among files (bins). Because of all this it is way I don't understand why this is not working...
On Wed, May 29, 2019 at 8:28 PM Gherman V. Uritskiy < notifications@github.com> wrote:
The bin (protein) files you gave me have a different naming convention than the contig names in the metawrap output you gave me. For example, k12128102 vs k121_4_flag_1_multi_5.0000_len_622. The fact that the metawrap output has the full contig name means that you probably gave it the -a option. Its possible that re-running the module on the same output did not overwrite your earlier attempts. I dont think there is an error here, you just need to re-run the program. Make sure you delete the old metawrap output, and do not provide the -a option.
Two more things. First, be careful when working with gene names in fasta format to make sure every identifier is unique. I noticed you simple truncated the contig name, but what happens when there are more than one gene of interest on one contig? I use names like k121_4_flag_1_multi_5.0000_len_622-1 and k121_4_flag_1_multi_5.0000_len_622-2 - that way i have the gene ID and full contig name. The second is that if you are estimating the gene abundance in DNA data, you will actually get more robust estimates by using the coverage of the whole contig to approximate the gene coverage. This way things like GC biases and random read drop-in and drop-out wont affect it. The core assumption here is that the CPM (counts per million) abundance of a gene is identical to that of the contig that carries it, which it true for metagenomic (DNA) data.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/bxlab/metaWRAP/issues/184?email_source=notifications&email_token=AEAA5GPXBQQSFM72SWLRP7TPX3KMPA5CNFSM4HPS33GKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWQMWXY#issuecomment-497077087, or mute the thread https://github.com/notifications/unsubscribe-auth/AEAA5GOQGQRWIBN5TIL3273PX3KMPANCNFSM4HPS33GA .
-- Igor Tiago Researcher Universidade de Coimbra
Laboratório Microbiologia Edificio Patronato Rua da Matemática Nº 49 3000-276 Coimbra, Portugal
Hi, I had this error over and over again. Thank you for any help. Best