Closed hchetia closed 2 years ago
I'm actually a little confused about why this is happening. It seems like despite your specifying the full path, only the bam file name made it into the shell script. If you just specify one bam with absolute path rather than using the for loop, do you get the same result? If you're using data that is not strand specific you should use the run_L1EM_unstranded.sh script instead. It also looks like your for loop will run L1EM for each bam from the same directory. Each time you run should be in a different directory. Otherwise you will overwrite the results of the previous run with the next run. However neither of these could cause the error your finding.
@wmckerrow Direct runs worked great till the end. But it would have been incredibly easier to be able to implement it in a shell script. Is there a way to modify the tool to allow users to specify an outdir to create files within for each bam file?
I am using stranded data at the moment and actually was using an additional "mkdir & cd" loop per bam file.
Can you add an echo between the do and the bash in your for loop and then me know what the exact commands are that are being executed?
I haven't tried exactly what you're doing, but I have used a for loop to write the L1EM commands into a bunch of shell scripts that I then run. I know that works.
Hi @wmckerrow , Before I run the for loop with echo, I would like to share the log of the successful run which has some warnings. Could you tell me if the run was really successful? File attached.
It looks like it failed to find the hg38 index in step 1. I'd recommend running with the bash -e option (i.e. bash -e run_L1EM.sh etc) to make it stops as soon as it hits the first error. To diagnose what went wrong I'd need the command you entered as well as the results of:
ls /fullpathto/bamfile.bam ls /fullpathto/hg38.fa
Thanks for the suggestion. Sharing the ls results below- ls /fullpathto/bamfile.bam*
Sorted_ID_4239*.bam Sorted_ID_4239*.bam.bai 4239 (the Working directory)
ls /fullpathto/L1EM_hg38_reference/hg38.fa*
L1EM_hg38_reference/hg38.fa L1EM_hg38_reference/hg38.fa.ann L1EM_hg38_reference/hg38.fa.gz L1EM_hg38_reference/hg38.fa.sa L1EM_hg38_reference/hg38.fa.amb L1EM_hg38_reference/hg38.fa.bwt L1EM_hg38_reference/hg38.fa.pac
Is there a "" in bam file name? Because is recognized as a wildcard, including it in the bam file name could be causing errors.
No There is no * being used in the filename or the cmd used. I just put it here because it is a long name.
Can you try it again with this shell script? I added some pre-checks: bash -e run_L1EM_diagnostic.txt etc
Hi @wmckerrow thanks for the diagnostic script. I just figured out that I was making an error from my end in specifying the genome filename. Now the index related error stopped. Here's the log of the run which finished with bash -e. But, the three counts files only have the headers and no other information.
PFA. nohup.txt
Best, Hasna
The error you are seeing in step 4 means that at least one of the bam files created in step 3 was empty. Most often this occurs when the system runs out of memory and silently kills one of the processes. You can try the more memory efficient run_L1EM_withlessmemory.sh script and/or run with more memory if that's an option.
Hi @wmckerrow I am running the program in a cluster with good memory (500 GB). How shall I specify a higher memory for the program?
How you specify memory for a just would depend on your cluster set up. 500GB should be more than enough memory. You can try the run_L1EM_withlessmemory.sh script that I uploaded recently. It sometimes helps with similar errors.
Hi I will try a run with that script and update
Hi @wmckerrow,
I reran with the less memory script and it differently worked further than the older script. However, I am not sure if the program ended successfully or not. I specially see one error about a paired end read set with different read names. I have attached the log of the run here. nohup1.txt Please advise. -Hasna
Hi I am following the steps to run L1EM exactly as suggested in your manual (provided absolute paths for all data) THE CMD: for i in *bam; do bash /L1EM/run_L1EM.sh /l1em/"${i}" /L1EM /l1em/L1EM_hg38_reference/GRCh38.primary_assembly.genome.fa; done;
I get the foll. error: THE ERROR LOG: STEP 1: realign [E::hts_open_format] Failed to open file 4239.bam samtools view: failed to open "4239.bam" for reading: No such file or directory [M::bam2fq_mainloop] discarded 0 singletons [M::bam2fq_mainloop] processed 0 reads [main] Version: 0.7.17-r1188 [main] CMD: L1EM/bin/bwa aln -k 3 -n 3 -t 16 -i 20 /l1em/L1EM_hg38_reference/GRCh38.primary_assembly.genome.fa unaligned.fq1 [main] Real time: 2.559 sec; CPU: 2.467 sec [main] Version: 0.7.17-r1188 [main] CMD: L1EM/bin/bwa aln -k 3 -n 3 -t 16 -i 20 /l1em/L1EM_hg38_reference/GRCh38.primary_assembly.genome.fa unaligned.fq2 [main] Real time: 2.678 sec; CPU: 2.671 sec [main] Version: 0.7.17-r1188 [main] CMD: L1EM/bin/bwa sampe /l1em/L1EM_hg38_reference/GRCh38.primary_assembly.genome.fa 1.sai 2.sai unaligned.fq1 unaligned.fq2 [main] Real time: 0.003 sec; CPU: 0.002 sec STEP 2: extract [E::hts_open_format] Failed to open file 4239.bam Traceback (most recent call last): File "/L1EM/utilities/read_or_pair_overlap_bed.py", line 73, in
main()
File "/L1EM/utilities/read_or_pair_overlap_bed.py", line 39, in main
inbam = pysam.AlignmentFile(bamfile,'rb')
File "pysam/libcalignmentfile.pyx", line 734, in pysam.libcalignmentfile.AlignmentFile.cinit
File "pysam/libcalignmentfile.pyx", line 933, in pysam.libcalignmentfile.AlignmentFile._open
IOError: [Errno 2] could not open alignment file
for read in pysam.AlignmentFile(bamfile):
File "pysam/libcalignmentfile.pyx", line 734, in pysam.libcalignmentfile.AlignmentFile.cinit
File "pysam/libcalignmentfile.pyx", line 933, in pysam.libcalignmentfile.AlignmentFile._open
IOError: [Errno 2] could not open alignment file
main()
File "/L1EM/L1EM.py", line 117, in main
for name in open(TE_list):
IOError: [Errno 2] No such file or directory: 'TE_list.txt'
STEP 6: Writing results
Traceback (most recent call last):
File "/L1EM/utilities/report_l1_exp_counts.py", line 29, in
X_est = dict(zip(pickle.load(open('names_final.pkl')),pickle.load(open('X_final.pkl'))))
IOError: [Errno 2] No such file or directory: 'names_final.pkl'
Traceback (most recent call last):
File "/L1EM/utilities/report_l1hs_transcription.py", line 29, in
X_est = dict(zip(pickle.load(open('names_final.pkl')),pickle.load(open('X_final.pkl'))))
IOError: [Errno 2] No such file or directory: 'names_final.pkl'
Traceback (most recent call last):
File "/L1EM/utilities/filtered_and_normalized_l1hs.py", line 25, in
X_est = dict(zip(pickle.load(open(sys.argv[1])),pickle.load(open(sys.argv[2]))))
IOError: [Errno 2] No such file or directory: 'names_final.pkl'
STEP 7: Clean up
cp: cannot stat ‘*final.pkl’: No such file or directory
4239.bam
: No such file or directory [E::hts_open_format] Failed to open file temp.bam samtools sort: can't open "temp.bam": No such file or directory [M::bam2fq_mainloop] discarded 0 singletons [M::bam2fq_mainloop] processed 0 reads [M::bam2fq_mainloop] discarded 0 singletons [M::bam2fq_mainloop] processed 0 reads STEP 3: candidate alignments [bwa_aln] fail to locate the index [bwa_aln] fail to locate the index [bwa_sai2sam_pe] fail to locate the index STEP 4: G(R) matrix construction [E::hts_open_format] Failed to open file 4239.bam Traceback (most recent call last): File "/L1EM/CGC/median_template_and_pairs.py", line 34, in4239.bam
: No such file or directory usage: G_of_R.py [-h] -b BAMFILE [-e ERROR_PROB] [-m MAX_START2START_LEN] [-r READS_PER_PICKLE] [-p PREFIX] [-n NMDIFF] -i INSERT_MEAN [--flanking FLANKING] [--as_start AS_START] [-w WIGGLE] [--min_len MIN_LEN] [--min_exon_len MIN_EXON_LEN] G_of_R.py: error: argument -i/--insert_mean: expected one argument STEP 5: Expectation maximization ls: cannot access ../G_of_R/pk2: No such file or directory ls: cannot access ../G_of_R/TE_list.txt: No such file or directory cp: missing destination file operand after ‘TE_list.txt’ Try 'cp --help' for more information. Traceback (most recent call last): File "/L1EM/L1EM/L1EM.py", line 168, in