KCCG / ClinSV

Robust detection of clinically relevant structural and copy number variation from whole genome sequencing data
Other
65 stars 8 forks source link

lumpy.caller.joined_1-15.sh: line 6: readLArr[]: bad array subscript #22

Closed halessi closed 2 years ago

halessi commented 2 years ago

Hi all,

Thanks for building such a cool tool.

I am running ClinSV in Singularity and received an error which I believe has not been addressed before, and I don't think it's due to path issues.

Perhaps it is a result of how my sample files are labeled in SampleInfo, but I don't think so. Here is the error stack:

 ### executing: sh /data/LABGROUP/project_folder_w_ped/SVs/joined/lumpy/sh/lumpy.caller.joined_1-15.sh &> /data/LABGROUP/project_folder_w_ped/SVs/joined/lumpy/sh/lumpy.caller.joined_1-15.e  ...  

 ### finished after (hh:mm:ss): 00:00:00
 ### exist status: 256

 ***** error exist status != 0 (256), please check /data/LABGROUP/project_folder_w_ped/SVs/joined/lumpy/sh/lumpy.caller.joined_1-15.e for more information

And then in that file:

+ export PATH=/opt/clinsv/bin:/opt/clinsv/bin:/opt/clinsv/root/bin:/bin/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
+ PATH=/opt/clinsv/bin:/opt/clinsv/bin:/opt/clinsv/root/bin:/bin/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
+ declare -A meanArr
+ declare -A stdevArr
+ declare -A readLArr
++ samtools view /data/LABGROUP/project_folder_w_ped/alignments/bqsr-cleaned-OtC5058/bqsr-cleaned-OtC5058.bam 1:1000000-1100000
++ cut -f 10
++ awk '{ print length}'
++ sort -rn
++ awk '(NR==1){print}'
[E::hts_open_format] fail to open file '/data/LABGROUP/project_folder_w_ped/alignments/bqsr-cleaned-OtC5058/bqsr-cleaned-OtC5058.bam'
samtools view: failed to open "/data/LABGROUP/project_folder_w_ped/alignments/bqsr-cleaned-OtC5058/bqsr-cleaned-OtC5058.bam" for reading: No such file or directory
+ readLArr[]=
/data/LABGROUP/project_folder_w_ped/SVs/joined/lumpy/sh/lumpy.caller.joined_1-15.sh: line 6: readLArr[]: bad array subscript

All prior steps were successfully run. Thank you in advance for your help.

Hugh

halessi commented 2 years ago

Note that there is no file at the one samtoools was looking for ->

the only options in project_folder_w_ped/alignments/ for the sample were

Is there a missing "/" in the file path and it's trying to access the directory at /alignments/bqsr-cleaned-OtC5058.bam/?

halessi commented 2 years ago

Okay so I fixed the "File not found part" by linking the files because for whatever reason step didn't run at the start like it does for the example BAM file, but the "bad array subscript error" persists.

The "samtools view" command runs (at least when I copy and paste into cmdline)


i did look in the issue "ClinSV v1.0 has issue with lumpy-sv"

i thought perhaps my bams were formatted incorrectly, however running

samtools view /data/LABGROUP/project_folder_w_ped/alignments/bqsr-cleaned-OtC5058/bqsr-cleaned-OtC5058.bam --header-only

yields no errors & prints to stdout, so please help! thank you so much

MinocheAE commented 2 years ago

Hi Halessi,

thank you for your interest in ClinSV.

If readLArr[] is empty, then no samples were detected from the bam's readgroup definition.

In your first attempt to run ClinSV on your sample a sampleInfo.txt file probably was created in the project folder. Delete it and retry running ClinSV. Because the bam file was potentially not read correctly on the first attempt, this file might be empty. When re-running ClinSV it first checks for the presence of this file and reads it over parsing the bam files again.

The readgroup definition in the bam header should contain information for SM, ID and LB. By comparing sampleInfo.txt of the test-bam run, you will see if your bam was parsed correctly.

If your readgroup definition is not containing SM, ID and LB, you might have to adjust the header.

I hope this helps

halessi commented 2 years ago

Thank you very much for your reply.

Interestingly, I have had to run ClinSV twice each time -- first, it detects the samples & creates a project folder + sampleInfo.txt, but then it hangs. Slurm outputs "complete job", but the log never prints anything after ### run jobs ###.

When I run it a second time, it detects the sampleInfo.txt and executes properly, leading to the error here.

I'm guessing this is the source of the issue. What can I do to make it run properly the first time? I suspect your solution will work, if we can get past the "hanging" or whatever is happening.

Has anyone else encountered this? For context, I am using singularity. I can post an example output in a sec Thank you for your help.

drmjc commented 2 years ago

we'll keep an eye on this over in the major refactor #13 thread

halessi commented 2 years ago

Hmm. Thanks for the reply.

I've since made a few attempts to get around this issue --

Specifically, if I run ClinSV in singularity & feed it a sample file, everything runs okay until running this command in the log:

project_folder_using_separate_sampleInfo/SVs/joined/lumpy/sh/lumpy.caller.joined_1-15.sh

Which fails with "no such file or directory", because there is no BAM when samtools tries to view.

+ meanArr[31]=
+ stdevArr[31]=
++ samtools view /data/LAB_FOLDER/project_folder_using_separate_sampleInfo/alignments/bqsr-cleaned-SAMPLE/bqsr-cleaned-SAMPLE.bam 1:1000000-1100000
++ cut -f 10
++ awk '{ print length}'
++ sort -rn
++ awk '(NR==1){print}'
[E::hts_open_format] fail to open file '/data/LAB_FOLDER/project_folder_using_separate_sampleInfo/alignments/bqsr-cleaned-SAMPLE/bqsr-cleaned-SAMPLE.bam'
samtools view: failed to open "/data/LAB_FOLDER/project_folder_using_separate_sampleInfo/alignments/bqsr-cleaned-SAMPLE/bqsr-cleaned-SAMPLE.bam" for reading: No such file or directory
+ readLArr[32]=

When I feed a sample file to ClinSV, it does not run the ln -s to link the BAMs to the project folder. If I don't feed a sampleInfo, it fails at the ln -s step.

I tried manually linking them to where I think ClinSV wants them, but I get this error:

++ sort -rn
++ samtools view /data/[LAB_FOLDER_REMOVED_FOR_PRIVACY]/project_folder_using_separate_sampleInfo/alignments/bqsr-cleaned-SAMPLE/bqsr-cleaned-SAMPLE.bam 1:1000000-1100000
++ awk '(NR==1){print}'
++ cut -f 10
+ readLArr[31]=151
+ read -r mean stdev
++ samtools view -r 31 /data/[LAB_FOLDER_REMOVED_FOR_PRIVACY]/project_folder_using_separate_sampleInfo/alignments/bqsr-cleaned-SAMPLE/bqsr-cleaned-SAMPLE.bam
++ python /opt/clinsv/clinSV/scripts/pairend_distro-a1.py -r 151 -X 2 -N 100000 -o /data/[LAB_FOLDER_REMOVED_FOR_PRIVACY]/project_folder_using_separate_sampleInfo/SVs/bqsr-cleaned-SAMPLE/lumpy/31.pe.histo
++ cut -d : -f 2
/opt/clinsv/python/lib/python2.7/site-packages/numpy-1.12.1-py2.7-linux-x86_64.egg/numpy/core/fromnumeric.py:2889: RuntimeWarning: Mean of empty slice.
  out=out, **kwargs)
/opt/clinsv/python/lib/python2.7/site-packages/numpy-1.12.1-py2.7-linux-x86_64.egg/numpy/core/_methods.py:80: RuntimeWarning: invalid value encountered in double_scalars
  ret = ret.dtype.type(ret / rcount)
Traceback (most recent call last):
  File "/opt/clinsv/clinSV/scripts/pairend_distro-a1.py", line 106, in <module>
    (removed, upper_cutoff))
TypeError: %d format: a number is required, not numpy.float64

Any insight would be MUCH MUCH appreciated! Thank you

drmjc commented 2 years ago

Are you using GRCh38? We've made lots of progress in a new branch and dockerfile + image that supports GRCh38 (see #13). Can you make a singularity image from the latest docker image & test again?

halessi commented 2 years ago

Closing because I believe all of these errors are a result of failing at ln -s step at the start, referenced by other issue

23