Closed halessi closed 2 years ago
Note that there is no file at the one samtoools was looking for ->
the only options in project_folder_w_ped/alignments/ for the sample were
Is there a missing "/" in the file path and it's trying to access the directory at /alignments/bqsr-cleaned-OtC5058.bam/?
Okay so I fixed the "File not found part" by linking the files because for whatever reason step didn't run at the start like it does for the example BAM file, but the "bad array subscript error" persists.
The "samtools view" command runs (at least when I copy and paste into cmdline)
i did look in the issue "ClinSV v1.0 has issue with lumpy-sv"
i thought perhaps my bams were formatted incorrectly, however running
samtools view /data/LABGROUP/project_folder_w_ped/alignments/bqsr-cleaned-OtC5058/bqsr-cleaned-OtC5058.bam --header-only
yields no errors & prints to stdout, so please help! thank you so much
Hi Halessi,
thank you for your interest in ClinSV.
If readLArr[] is empty, then no samples were detected from the bam's readgroup definition.
In your first attempt to run ClinSV on your sample a sampleInfo.txt file probably was created in the project folder. Delete it and retry running ClinSV. Because the bam file was potentially not read correctly on the first attempt, this file might be empty. When re-running ClinSV it first checks for the presence of this file and reads it over parsing the bam files again.
The readgroup definition in the bam header should contain information for SM, ID and LB. By comparing sampleInfo.txt of the test-bam run, you will see if your bam was parsed correctly.
If your readgroup definition is not containing SM, ID and LB, you might have to adjust the header.
I hope this helps
Thank you very much for your reply.
Interestingly, I have had to run ClinSV twice each time -- first, it detects the samples & creates a project folder + sampleInfo.txt, but then it hangs. Slurm outputs "complete job", but the log never prints anything after ### run jobs ###.
When I run it a second time, it detects the sampleInfo.txt and executes properly, leading to the error here.
I'm guessing this is the source of the issue. What can I do to make it run properly the first time? I suspect your solution will work, if we can get past the "hanging" or whatever is happening.
Has anyone else encountered this? For context, I am using singularity. I can post an example output in a sec Thank you for your help.
we'll keep an eye on this over in the major refactor #13 thread
Hmm. Thanks for the reply.
I've since made a few attempts to get around this issue --
Specifically, if I run ClinSV in singularity & feed it a sample file, everything runs okay until running this command in the log:
project_folder_using_separate_sampleInfo/SVs/joined/lumpy/sh/lumpy.caller.joined_1-15.sh
Which fails with "no such file or directory", because there is no BAM when samtools tries to view.
+ meanArr[31]=
+ stdevArr[31]=
++ samtools view /data/LAB_FOLDER/project_folder_using_separate_sampleInfo/alignments/bqsr-cleaned-SAMPLE/bqsr-cleaned-SAMPLE.bam 1:1000000-1100000
++ cut -f 10
++ awk '{ print length}'
++ sort -rn
++ awk '(NR==1){print}'
[E::hts_open_format] fail to open file '/data/LAB_FOLDER/project_folder_using_separate_sampleInfo/alignments/bqsr-cleaned-SAMPLE/bqsr-cleaned-SAMPLE.bam'
samtools view: failed to open "/data/LAB_FOLDER/project_folder_using_separate_sampleInfo/alignments/bqsr-cleaned-SAMPLE/bqsr-cleaned-SAMPLE.bam" for reading: No such file or directory
+ readLArr[32]=
When I feed a sample file to ClinSV, it does not run the ln -s to link the BAMs to the project folder. If I don't feed a sampleInfo, it fails at the ln -s step.
I tried manually linking them to where I think ClinSV wants them, but I get this error:
++ sort -rn
++ samtools view /data/[LAB_FOLDER_REMOVED_FOR_PRIVACY]/project_folder_using_separate_sampleInfo/alignments/bqsr-cleaned-SAMPLE/bqsr-cleaned-SAMPLE.bam 1:1000000-1100000
++ awk '(NR==1){print}'
++ cut -f 10
+ readLArr[31]=151
+ read -r mean stdev
++ samtools view -r 31 /data/[LAB_FOLDER_REMOVED_FOR_PRIVACY]/project_folder_using_separate_sampleInfo/alignments/bqsr-cleaned-SAMPLE/bqsr-cleaned-SAMPLE.bam
++ python /opt/clinsv/clinSV/scripts/pairend_distro-a1.py -r 151 -X 2 -N 100000 -o /data/[LAB_FOLDER_REMOVED_FOR_PRIVACY]/project_folder_using_separate_sampleInfo/SVs/bqsr-cleaned-SAMPLE/lumpy/31.pe.histo
++ cut -d : -f 2
/opt/clinsv/python/lib/python2.7/site-packages/numpy-1.12.1-py2.7-linux-x86_64.egg/numpy/core/fromnumeric.py:2889: RuntimeWarning: Mean of empty slice.
out=out, **kwargs)
/opt/clinsv/python/lib/python2.7/site-packages/numpy-1.12.1-py2.7-linux-x86_64.egg/numpy/core/_methods.py:80: RuntimeWarning: invalid value encountered in double_scalars
ret = ret.dtype.type(ret / rcount)
Traceback (most recent call last):
File "/opt/clinsv/clinSV/scripts/pairend_distro-a1.py", line 106, in <module>
(removed, upper_cutoff))
TypeError: %d format: a number is required, not numpy.float64
Any insight would be MUCH MUCH appreciated! Thank you
Are you using GRCh38? We've made lots of progress in a new branch and dockerfile + image that supports GRCh38 (see #13). Can you make a singularity image from the latest docker image & test again?
Closing because I believe all of these errors are a result of failing at ln -s step at the start, referenced by other issue
Hi all,
Thanks for building such a cool tool.
I am running ClinSV in Singularity and received an error which I believe has not been addressed before, and I don't think it's due to path issues.
Perhaps it is a result of how my sample files are labeled in SampleInfo, but I don't think so. Here is the error stack:
And then in that file:
All prior steps were successfully run. Thank you in advance for your help.
Hugh