ctSkennerton / crass

The CRISPR assembler
http://ctskennerton.github.io/crass
GNU General Public License v3.0
35 stars 11 forks source link

FATAL ERROR: parseSeqFiles failed #101

Closed maureenbug closed 3 years ago

maureenbug commented 3 years ago

Hi! I saw that people previously had this issue back in 2016, but this error recently came up for me. I only get this error on certain datasets, so I'm not sure if there is something wrong with them (and something I can check)?

here is the version information: CRisprASSembler (crass) version 0 subversion 3 revison 12 (0.3.12)

here is the tail end of the log file

>A00178:49:H7L3HDSXX:2:2678:25129:8656 1:N:0:TCGTGGAT+ATCCACGA GTTTCACACCGGACGAAGAACACGTCGCGTTTGTCGAGAAATACCGAAACTTTGTAATCCCCTGGCGGGGATTTAGGTGTTTCACACATATCTGCGCGGTATGATAGAGACACAGTAATCGATCTTTGTAATCCCCTGGCGGGGATTTAGG 3d 8h 21m 24s ERR WorkHorse.cpp : int WorkHorse::doWork(Vecstr) : 191: FATAL ERROR: parseSeqFiles failed

CRASS will always fail and give this error (on this dataset) at the same sequence. I did try running CRASS just on "A00178:49:H7L3HDSXX:2:2678:25129:8656" the last sequence listed, but it runs fine.

Any help is appreciated thanks!

ctSkennerton commented 3 years ago

Can you update to the latest version to see if that fixes it?

maureenbug commented 3 years ago

okay so I updated it, and only 1 of 5 failed this time (whereas before all 5 failed), so progress?

ctSkennerton commented 3 years ago

is the dataset public or something that I could access to see if I can reproduce?

maureenbug commented 3 years ago

it is!

https://img.jgi.doe.gov/cgi-bin/mer/main.cgi?section=TaxonDetail&page=taxonDetail&taxon_oid=3300031980

you can download here: https://data.jgi.doe.gov/refine-download/img?its_ap_id=1209422&expanded=1209422

ctSkennerton commented 3 years ago

is there any smaller datasets? I don't have access to a server anymore to do personal projects and the files are too big to process on my laptop.

maureenbug commented 3 years ago

let me try to split up the data into subsets and see if I still get the same error on those subsets - then I can send one of those to you

maureenbug commented 3 years ago

okay I ran CRASS on the subsets of data to see which one to send you, but they all worked??? so I guess everything is good now?

ctSkennerton commented 3 years ago

yeah, bug was probably caused during the graph construction which is dependent on input file. But the reads that get identified at the beginning shouldn't change too much.