erasmus-center-for-biomics / Nimbus

The Nimbus software suite for the analysis of amplicon based NGS data
MIT License
5 stars 5 forks source link

align failed #5

Open booew opened 6 years ago

booew commented 6 years ago

mydata.zip

I used Nimbus to analyze my data, with a bed file and pair end fastq data, using the hg38 reference. I checked my fast data, the read ends could match the bed file. Whereas, after running Nimbus, I get error information: Loaded 0 bases in 0 amplicons. it seems the alignment failed. #2

booew commented 6 years ago

It seems that the tools does not support hg38, please use hg19.

erasmus-center-for-biomics commented 6 years ago

Dear booew,

The standard GRCh38 reference sequence includes text after the chromosome qualifiers. For example ">chr1 AC:CM000663.2 gi:568336023 LN:248956422 rl:Chromosome M5:6aef897c3d6ff0c78aff06ac189178dd AS:GRCh38"

Everything after the chrN should be removed from the header rows. This can be done with the following code: cat Homo_sapiens_assembly38.fasta | sed 's/ +.*$//g' > Homo_sapiens_assembly38.clean.fasta

With the Homo_sapiens_assembly38.clean.fasta file, 99 amplicons can be loaded.

I assume the design that attached to the message is not complete as only 36 reads aligned. Otherwise the amplicon design could be hg19 (the standard genome at Agilent) while the reference is GRCh38.

with kind regards,

Rutger