cpockrandt / genmap

GenMap - Fast and Exact Computation of Genome Mappability
Other
100 stars 18 forks source link

Use only name of entry in fasta file for creating index #12

Closed duartemolha closed 4 years ago

duartemolha commented 4 years ago

Hi

I downloaded a common fasta assembly for human usualy used for BWA mapping

In it the fasta header for each entry looks like this:

">chr1 AC:CM000663.2 gi:568336023 LN:248956422 rl:Chromosome M5:6aef897c3d6ff0c78aff06ac189178dd AS:GRCh38"

if you use this file to create the the GenMap index, the software takes the entire line as the name for the chromossome "chr1 AC:CM000663.2 gi:568336023 LN:248956422 rl:Chromosome"

This, in-turn , whe you use genmap map the bed file created will look like :

chr1 AC:CM000663.2 gi:568336023 LN:248956422 rl:Chromosome [start] [end] [-1] [mapping score]

This is obviously a incorrect bed file format.

The same with the wig file when trying to convert the wig to bigwig with the chrom.sizes file , it fails because it expected only 1 chr field and 1 chr size on the chrom.size file

Ideally for the index creation genmap should only take the name of from the fasta header (i.e "chr1" and ignore all the rest of the line

After I have seen this I have now modified my input fasta file to remove the rest of the string from the header and recreated the index. This works fine, but maybe it could be supported directly on the software tool?

Thanks

Duarte

cpockrandt commented 4 years ago

It's on the master branch now, but it will probably take until January until I publish a new release.