Open kunaljaani opened 4 months ago
Hi Kunal, thanks again for reporting this issue and apologies for my delay in responding.
I believe that this problem occurs due to the fact that contig headers are not guaranteed to be unique across MAGs. It is unlikely, but evidently it is possible that two different contigs are assembled with the same node_length_coverage values across different genomes from the same sample, in which case you will see those errors you have reported. I will have to think about the best way to deal with this in the future, but for now you can simply add a unique MAG ID at the start of each contig header when concatenating MAGs to avoid this issue.
Instead of using cat
, use a loop to append MAG ID at the start of each contig e.g. assuming your mags for a given sample are in subfolder mags
with .fa
extension:
ls mags|grep ".fa"|while read genome;do paste mags/$genome| sed "s/>/>${genome%.*}_/g";done > sample.fa
Best wishes, Francisco
Hi Francisco,
Thank you very much for your reply and the detailed explanation. Thanks a lot Kunal
Hi! Francisco While running the
-t abundance
I am running into a problem with the sam to bam conversation due to a duplicate header. When I checked the grep "NODE_1753_length_683_cov_1.602310" FDSGM6.sam -c I found more than one occurrence.I am also surprised to see that 2 of 35 samples ran fine. and if there was some issue it would have created problems for all the samples. So I am not able to figure out the issue. Could you please suggest a solution?
Thank you Kunal