bergmanlab / TELR

TELR is a fast non-reference transposable element detector from long read sequencing data.
https://github.com/bergmanlab/TELR
BSD 2-Clause "Simplified" License
31 stars 11 forks source link

[Errno 7] Argument list too long: '/bin/sh' #31

Open Morriyaty opened 1 year ago

Morriyaty commented 1 year ago

Hi,

Thanks a lot for such a wonderful program.

But when I run it based on telr -i /opt/synData/sample.bam -l ../04.tldr/rename.library.fa -r ../04.tldr/LG.fasta -o output -t 40, I got the error like this: [Errno 7] Argument list too long: '/bin/sh'. The full log file is shown below.

How can I fix it?

Bests, Yinjia telr.log

shunhuahan commented 1 year ago

Hi @wyj-lzu,

Thanks for reporting this issue and sorry for my late reply!

Looks like the error occurred within this function https://github.com/bergmanlab/TELR/blob/406a704a7094aa1338bf40465304bef332deda1d/src/telr/TELR_assembly.py#L369.

My current theory is that the error was raised at this line https://github.com/bergmanlab/TELR/blob/406a704a7094aa1338bf40465304bef332deda1d/src/telr/TELR_assembly.py#L454, when TELR used csplit to split the raw fasta file into a bunch of small fasta files, one for each insertion locus. The index argument is a list of numbers TELR gave to csplit to do the file splitting (each number is corresponding to the number of lines for one small fasta). When the numbers are too high (too many insertion candidates), it might cause this argument list too long error to occur.

To help with debugging, I created a new branch of TELR to catch csplit error (https://github.com/bergmanlab/TELR/tree/long_arg_fix). It will be helpful if you switch to this new branch, activate TELR conda env, locally install TELR using pip, re-run TELR with --keep_files option, and share the log file. This way we can confirm the issue is caused by csplit and and work on a fix.

cd TELR # this is the TELR git folder
git pull
git checkout long_arg_fix # switch to new branch
conda activate TELR # active TELR environment
pip install . # install the latest update locally

# run TELR on the input data with --keep_files option
telr --keep_files ...

If the input files you used for running TELR is not crazy large, you could also consider sharing them (or a subsetted dataset that can produce the error) through google drive with hanshunhua0829@gmail.com.

Best, Shunhua

Morriyaty commented 1 year ago

When I run the command you mentioned above, I got a similar error.

My bam file is pretty large (67G). How can I share it with you? By the way, the bam file is called with nglmr. telr.log

shunhuahan commented 1 year ago

Hi @Morriyaty,

Thanks for the rerun! With your latest log file, I was able to confirm the issue is caused by the csplit command I explained at https://github.com/bergmanlab/TELR/issues/31#issuecomment-1445203296. No worries for sharing the input file. I will make an update to TELR and let you know once it's ready!

Best, Shunhua