DRL / blobtools

Modular command-line solution for visualisation, quality control and taxonomic partitioning of genome datasets
GNU General Public License v3.0
184 stars 44 forks source link

Bug in parseFasta ?? #21

Closed ptranvan closed 8 years ago

ptranvan commented 8 years ago

Hi,

Using that command:

blobtools create -i sm.scafSeq -t assembly_se_uniref.daa.tagc -t assembly_se_nt.blastn -y soap --nodes nodes.dmp --names names.dmp

I got an error:

[STATUS] : Parsing FASTA - sm.scafSeq Traceback (most recent call last): File "blobtools/create.py", line 56, in blobDb.parseFasta(fasta_f, fasta_type) File "blobtools/lib/BtCore.py", line 265, in parseFasta cov = BtIO.parseCovFromHeader(fasta_type, blObj.name) File "blobtools/lib/BtIO.py", line 146, in parseCovFromHeader return float(temp[2]/(temp[1]+1-75))

Do you have any solution ??

DRL commented 8 years ago

Hi ptranvan,

is sm.scafSeq a FASTA file generated by soap? could you sent me a selection of the headers in the FASTA file to see what goes on there?

cheers,

dom

ptranvan commented 8 years ago

Thanks for your quick answer.

Is has been assembled with soapdenovo2: The header looks like this:

scaffold1 43.4 scaffold2 36.2 scaffold3 12.8

DRL commented 8 years ago

Hi ptravan,

we usually don't use either soap or abyss so I wasn't aware that this didn't work properly.

Due to the structure of the code and the general behaviour of other bioinformatic tools, there is no easy way of fixing the parsing for abyss and soap (AFAIK they are the only assemblers that have spaces in their headers).

Hence I will drop the support for coverage parsing for these two assemblers.

In order to run blobtools with your assembly you can just generate a simple coverage file using the following one-liner:

grep '^>' assembly.fna | sed 's/^>//g' | awk '{print $1 "\t" $2}' > assembly.cov

and then just provide the cov file with -c assembly.cov (without the option -y soap)

cheers,

dom