ctSkennerton / crass

The CRISPR assembler
http://ctskennerton.github.io/crass
GNU General Public License v3.0
35 stars 11 forks source link

[ERROR]: FATAL ERROR: parseSeqFiles failed #82

Open shaman-narayanasamy opened 8 years ago

shaman-narayanasamy commented 8 years ago

Hi,

I attempted running crass on my data. The the command below was issued followed by the STDOUT:

$ crass -o /scratch/users/snarayanasamy/LAO_TS_CRISPR/D20/crass_mg-reads /scratch/users/snarayanasamy/LAO_TS/D20/Preprocessing/MG.R1.trimmed.fq /scratch/users/snarayanasamy/LAO_TS/D20/Preprocessing/MG.R2.trimmed.fq /scratch/users/snarayanasamy/LAO_TS/D20/Preprocessing/MG.SE.trimmed.fq
[crass_patternFinder]: Processed 62127053 ...333 sec
[crass_clusterCore]: 20041 variants mapped to 7105 clusters
[crass_clusterCore]: creating non-redundant set
[crass_clusterCore]: 24290 non-redundant patterns.
[crass_singletonFinder]: Processed 62127053 ...285 sec
[crass_patternFinder]: Found 158218 reads
183 : 226
[ERROR]: Something wrong with front offset!
ss iter: 103
front offset: -43
Header: BUTTERS-W7D:283:BC1A93ACXX:5:2109:20394:28385
LowLexi: 0 
 43,70,103,99,
Sequence:CAGCTTAGAAAGTTCAAATAATTGCAAAGCATGGGTTGGTTAACTTCACTGCCGCATAGGCAGCTTAGAAAGTTCAAATAATTGCAAAGCATGGGTTGGT
Len: 100
---------------------------------------------
---------------------------------------------

ReadHolder.cpp : 417 : void ReadHolder::updateStartStops(int, std::string*, const options*)
[ERROR]: FATAL ERROR: parseSeqFiles failed
WorkHorse.cpp : 191 : int WorkHorse::doWork(Vecstr)

Note that I am running crass on data from multiple samples. It only shows problems on data from this particular sample. The reads were preprocessed prior to analysis with crass. I have a machine with more than enough memory (>100GB RAM). Any Idea why this could happen?

Best regards, Shaman

ctSkennerton commented 8 years ago

What version of crass are you using?

On Jul 15, 2016, at 11:05 AM, Shaman notifications@github.com wrote:

Hi,

I attempted running crass on my data. The the command below was issued followed by the STDOUT:

$ crass -o /scratch/users/snarayanasamy/LAO_TS_CRISPR/D20/crass_mg-reads /scratch/users/snarayanasamy/LAO_TS/D20/Preprocessing/MG.R1.trimmed.fq /scratch/users/snarayanasamy/LAO_TS/D20/Preprocessing/MG.R2.trimmed.fq /scratch/users/snarayanasamy/LAO_TS/D20/Preprocessing/MG.SE.trimmed.fq [crass_patternFinder]: Processed 62127053 ...333 sec [crass_clusterCore]: 20041 variants mapped to 7105 clusters [crass_clusterCore]: creating non-redundant set [crass_clusterCore]: 24290 non-redundant patterns. [crass_singletonFinder]: Processed 62127053 ...285 sec [crass_patternFinder]: Found 158218 reads 183 : 226 [ERROR]: Something wrong with front offset! ss iter: 103 front offset: -43 Header: BUTTERS-W7D:283:BC1A93ACXX:5:2109:20394:28385 LowLexi: 0 43,70,103,99, Sequence:CAGCTTAGAAAGTTCAAATAATTGCAAAGCATGGGTTGGTTAACTTCACTGCCGCATAGGCAGCTTAGAAAGTTCAAATAATTGCAAAGCATGGGTTGGT

Len: 100


ReadHolder.cpp : 417 : void ReadHolder::updateStartStops(int, std::string, const options) [ERROR]: FATAL ERROR: parseSeqFiles failed WorkHorse.cpp : 191 : int WorkHorse::doWork(Vecstr) Note that I am crass on data from multiple samples. It only shows problems on this particular data. Any Idea why this could happen?

Best regards, Shaman

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ctSkennerton/crass/issues/82, or mute the thread https://github.com/notifications/unsubscribe-auth/AAp0sx_Vy5EYNXtv1cZLmqAncfjT8NnDks5qV8vwgaJpZM4JNpat.

shaman-narayanasamy commented 8 years ago

I cloned the git repo and built it. It is in the master branch.

$ crass --version

CRisprASSembler (crass)
version 1 subversion 0 revison 0 (1.0.0)

---------------------------------------------------------------
Copyright (C) 2011-2015 Connor Skennerton & Michael Imelfort
Copyright (C) 2016      Connor Skennerton
This program comes with ABSOLUTELY NO WARRANTY
This is free software, and you are welcome to redistribute it
under certain conditions: See the source for more details
---------------------------------------------------------------
ctSkennerton commented 8 years ago

I just pushed up a new commit. Can you download it again and see if it fixes the problem.

shaman-narayanasamy commented 8 years ago

Cloned the updated repo and rebuilt. Worked this time!

Not sure if this is interesting to you, but the numbers are lower now compared to the version before:

$ crass -o /scratch/users/snarayanasamy/LAO_TS_CRISPR/D20/crass_mg-reads /scratch/users/snarayanasamy/LAO_TS/D20/Preprocessing/MG.R1.trimmed.fq /scratch/users/snarayanasamy/LAO_TS/D20/Preprocessing/MG.R2.trimmed.fq /scratch/users/snarayanasamy/LAO_TS/D20/Preprocessing/MG.SE.trimmed.fq

[crass_patternFinder]: Processed 62127053 ...319 sec
[crass_clusterCore]: 14143 variants mapped to 5023 clusters
[crass_clusterCore]: creating non-redundant set
[crass_clusterCore]: 16906 non-redundant patterns.
[crass_singletonFinder]: Processed 62127053 ...251 sec
[crass_patternFinder]: Found 133485 reads
[crass_graphBuilder]: 227 CRISPRs found!

Thanks a lot :)

shaman-narayanasamy commented 8 years ago

Hi, Sorry about this, but now it is failing on a different data set.

$ crass -o /scratch/users/snarayanasamy/LAO_TS_CRISPR/D44/crass_mt-reads /scratch/users/snarayanasamy/LAO_TS_IMP-v1.3/D44/Preprocessing/mt.r1.trimmed.rna_filtered.fq /scratch/users/snarayanasamy/LAO_TS_IMP-v1.3/D44/Preprocessing/mt.r2.trimmed.rna_filtered.fq /scratch/users/snarayanasamy/LAO_TS_IMP-v1.3/D44/Preprocessing/mt.se.trimmed.rna_filtered.fq

187 : 220
[ERROR]: Something wrong with front offset!
ss iter: 102
front offset: -33
Header: HWI-ST201:388:BC2KH8ACXX:6:1301:14539:10320
LowLexi: 1 
 36,71,102,99,
Sequence:TCTGACACCGTGGTCTACTTTGCATTGATGTATACGGTTGTAACTCTTCCCTGATTATTAAGGGATTAAGACACCGTGGTCTACTTTGCATTGATGTATA
Len: 100
---------------------------------------------
---------------------------------------------

ReadHolder.cpp : 417 : void ReadHolder::updateStartStops(int, std::string*, const options*)
[ERROR]: FATAL ERROR: parseSeqFiles failed
WorkHorse.cpp : 191 : int WorkHorse::doWork(Vecstr)
ctSkennerton commented 8 years ago

Hmm, seems like I only fixed a symptom of the problem with the previous commit. What is strange is that when I take just this one read and run it through I don't get the error. Could you try doing the same thing? Is this a public dataset that I could also download and debug on?

ctSkennerton commented 8 years ago

Also can you send me the full log file of the failed run?

ghost commented 7 years ago

Hi, following the same thread. I get a very similar error, but only in certain datasets.

crass: /opt/apps/resif/devel/v1.1-20150414/core/software/tools/cURL/7.46.0-ictce-7.3.5/lib/libcurl.so.4: no version information available (required by /home/users/snarayanasamy/xerces-c-3.1.4/lib/libxerces-c-3.1.so)
[ERROR]: Header: ERR843255.51624
LowLexi: 96
 4,31,69,96,134,161,
Sequence:GAGCGTTCAGATTCCTCTATGGACAATGGTAACATTTGTAGCCTGGGGTTTGATCTCTTTTTGAGTGGAGTTCAGATTCCTCTATGGACAATGGTAACTTTAAGCGATATACACAAAAGGGGATATAATATACTGTTCAGATTCCTCTATGGACAATAGAT
Len: 161
---------------------------------------------
---------------------------------------------
ss list out of range; 161 > 160
ReadHolder.cpp : 924 : bool ReadHolder::getNextSpacer(std::string*)
[ERROR]:
ReadHolder.cpp : 235 : void ReadHolder::getAllSpacerStrings(std::vector<std::basic_string<char> >&)
[ERROR]: Fatal error in search algorithm!
libcrispr.cpp : 146 : int searchFile(const char*, const options&, ReadMap*, StringCheck*, lookupTable&, lookupTable&, time_t&)
[ERROR]: FATAL ERROR: parseSeqFiles failed
WorkHorse.cpp : 191 : int WorkHorse::doWork(Vecstr)

Here is what the log file says:

----------------------------------------------------------------------
-- 10/05/2017_04:38  --  CRisprASSembler (crass) --  Version: 1.0.0 --
----------------------------------------------------------------------
----------------------------------------------------------------------

0s      I   Parsing reads in 1 files
0s      I   Parsing file: ../../imp_results/biogas/Preprocessing/mg.r1.trimmed.fq
0s      ERR ReadHolder.cpp : bool ReadHolder::getNextSpacer(std::string*) : 924: Header: ERR843249.29011
LowLexi: 96
 7,34,70,97,134,161,
Sequence:ATTTATAACTTTTAATCGCACCATAAGGAATTGAAATTGGATAACTAATGAACTTGAGATGAGTAAAAAGACTTTTAATCGCACCATAAGGAATTGAAATATTTACAGAAACGATGATGGCTATGAGTATAAGTTCTTTTAATCGCACCATAAGGAATTGA
Len: 161
---------------------------------------------
---------------------------------------------
ss list out of range; 161 > 160
0s      ERR WorkHorse.cpp : int WorkHorse::doWork(Vecstr) : 191: FATAL ERROR: parseSeqFiles failed

I provide you also one of the datasets that fails (preprocessed metatranscriptomic reads): https://www.dropbox.com/s/sy8ywj1s1uqe3zb/mt.r1.trimmed.rna_filtered.fq?dl=0

Another issue I am facing is that when I git pull, the version remains 1.0.0

crass: /opt/apps/resif/devel/v1.1-20150414/core/software/tools/cURL/7.46.0-ictce-7.3.5/lib/libcurl.so.4: no version information available (required by /home/users/snarayanasamy/xerces-c-3.1.4/lib/libxerces-c-3.1.so)

CRisprASSembler (crass)
version 1 subversion 0 revison 0 (1.0.0)

---------------------------------------------------------------
Copyright (C) 2011-2015 Connor Skennerton & Michael Imelfort
Copyright (C) 2016      Connor Skennerton
This program comes with ABSOLUTELY NO WARRANTY
This is free software, and you are welcome to redistribute it
under certain conditions: See the source for more details
---------------------------------------------------------------

Thanks in advance for your help,

Best regards, Susana

ctSkennerton commented 7 years ago

Hi Susana,

I've downloaded the file you linked on dropbox but on my computer it runs fine without any errors. Did you change any of the command line options from their default?

ghost commented 7 years ago

Hi, I have not changed the default parameters yet. Actually, it works with other datasets (default and not default parameters). Maybe the issue is that I am using an old version? After I do git pull, it seems that the version is not being uploaded.

$ git checkout v1.0.1
Note: checking out 'v1.0.1'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

  git checkout -b new_branch_name

HEAD is now at 19f73f5... add in additional test in extendPreRepeat
$ crass --version
crass: /opt/apps/resif/devel/v1.1-20150414/core/software/tools/cURL/7.46.0-ictce-7.3.5/lib/libcurl.so.4: no version information available (required by /home/users/snarayanasamy/xerces-c-3.1.4/lib/libxerces-c-3.1.so)

CRisprASSembler (crass)
version 1 subversion 0 revison 0 (1.0.0)

---------------------------------------------------------------
Copyright (C) 2011-2015 Connor Skennerton & Michael Imelfort
Copyright (C) 2016      Connor Skennerton
This program comes with ABSOLUTELY NO WARRANTY
This is free software, and you are welcome to redistribute it
under certain conditions: See the source for more details
---------------------------------------------------------------
$ crass
crass: /opt/apps/resif/devel/v1.1-20150414/core/software/tools/cURL/7.46.0-ictce-7.3.5/lib/libcurl.so.4: no version information available (required by /home/users/snarayanasamy/xerces-c-3.1.4/lib/libxerces-c-3.1.so)
Compiler Options:
RENDERING = 0
DEBUG = 0
MEMCHECK = 0
ASSEMBER = 1
VERBOSE_LOGGER = 0
Search Debugger =  0

Usage:  crass  [options] { inputFile ...}

What do you suggest? Thanks,

ctSkennerton commented 7 years ago

I don't think that's a problem, I just forgot to update the version number in one of the files so it isn't going to change the results any. Sorry for the confusion. It may be that your operating system uses a slightly different setup to the one I'm using, which may be causing the issue. Can you tell me what operating system you are using and what version of the c++ compiler you have on your system?

ghost commented 7 years ago

c++ compiler

$ gcc --version
gcc (Debian 4.7.2-5) 4.7.2
Copyright (C) 2012 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

operating system

$ uname -a
Linux access.gaia-cluster.uni.lux 3.2.0-4-amd64 #1 SMP Debian 3.2.82-1 x86_64 GNU/Linux`
ctSkennerton commented 7 years ago

I'm still not able to replicate this error on the file you gave me. I wasn't able to download you exact operating system but I did recompile my copy of the code with the same version of gcc that you are using. I'm really sorry about this but I don't know why it's giving you an error and not me. Perhaps you could try it on another computer.

ghost commented 7 years ago

Ok, thanks a lot! I have reinstalled crass, and I do not get any error now.