Open alfonsosaera opened 1 month ago
Hi, I come back with an update.
I tried samtools bam2fq
on the same BAM file and it works well.
This is the command
samtools bam2fq -1 paired1.fq -2 paired2.fq -0 read_other.fq -s singleton.fq -n -F 0x900 ${BAMpath}/Cyto_100_trimmed_Aligned.sortedByCoord.out.bam
the terminal messages of samtools
[M::bam2fq_mainloop] discarded 75502250 singletons
[M::bam2fq_mainloop] processed 78473240 reads
and the content of the fastq files, just one example:
head paired1.fq
@LH00169:289:223VVWLT4:3:1234:17039:21760
CAGGCTGGGTGGAGCCGTCCCCCCATGGAGCACAGGCAGACAGAAGTCCCCGCCCCAGCTGTGTGGCCTCAAGCCAGCCTTCCGCTCCTTGAAGCTGGTCTCCACACAGTGCTGGTTCCGTC
+
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII9IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
@LH00169:289:223VVWLT4:3:2305:34267:19840
TGTCATGGAGCCCCCTACGATTCCCAGTCGTCCTCGTCCTCCTCTGCCTGTGGCTGCTGCGGTGGCGGCAGAGGAGGGATGGAGTCTGACACGCGGGCAAAGGCTCCTCCGGGCCCCTCACCAGCCCCAGGTCCTTTCCC
+
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII9IIIIIIIIIIIIIIIIIIIIIII
Thanks in advance for your help!
Alfonso
Hello,
The first stage of the first step, $samtools view -@ $threads -b -F 2 $bamfile,
attempts to extract all the unaligned reads from the bam file.
As all the reads in your bam file are aligned correctly, the unaligned files are empty.
78473240 + 0 properly paired (100.00% : N/A)
L1EM uses the unaligned reads as part of its analysis. Could unaligned reads have been filtered out when your bam file was created?
Additionally, L1EM requires paired end read alignments to hg38. If your reads are single end, then you will run into an error in this part of step 1:
$bwa aln -k $realignNM -n $realignNM -t $threads -i $bwa_i $hg38 unaligned.fq1 > 1.sai
$bwa aln -k $realignNM -n $realignNM -t $threads -i $bwa_i $hg38 unaligned.fq2 > 2.sai
Hi @MarkGrivainis and @ekaparos , thanks for your answers!
I aligned my reads with STAR, how should I proceed to generate a BAM suitable for L1EM? I cannot find this information. My reads are paired end and aligned to the hg38.
Thanks for your help!
Alfonso
Hi @alfonsosaera,
I do not have any experience with STAR. However, after looking into the issue, I think STAR discards unmapped reads by default. When aligning the reads, you need to set the --outSAMunmapped Within
flag to store them in the resulting SAM file.
Hi @MarkGrivainis ,
I made it work! Now L1EM analyzed the new BAM with unmapped reads without errors. Thanks for your help!
One of the tree outputs is empty, is this normal?
==> filter_L1HS_FPM.txt <==
family.category.locus.strand only 3prunon
==> full_counts.txt <==
family.category.locus.strand only 3prunon passive_sense passive_antisense antisense
L1PA3.0.chr1:46504027-46504947.- 0.0 0.0 2.0262262054950715 0.0 0.0
L1HS.0.chr8:29120166-29121108.- 0.0 0.0 0.0 3.9653876043043508 0.0
L1PA6.0.chr3:130315158-130315424.- 0.0 0.0 0.999804886384662 0.0 0.0
L1PA3.0.chr16:49134096-49136751.- 0.0 0.0 1.368642759639711 0.0 0.0
L1HS.0.chr10:13217375-13218700.+ 0.0 0.0 0.0 2.2467241734077064 0.0
L1HS.0.chr6:124839973-124841353.- 0.0 0.0 2.1766472804965966 0.0 0.0
L1PA2.0.chr4:161973830-161975754.+ 0.0 0.0 0.0 2.3828525735526958 0.0
L1HS.0.chr15:55986934-55988017.- 0.0 0.0 4.341820378554968 0.0 0.0
L1PA3.0.chr9:119133669-119135888.+ 0.0 0.0 1.1751923595072473 0.0 0.0
==> l1hs_transcript_counts.txt <==
family.category.locus.strand only 3prunon
L1HS.1.chr6:24811657-24817706.- 14.25756758085305 0.0
L1HS.1.chr2:4733729-4739760.+ 5.326441010932766 0.0
L1HS.1.chr2:129415687-129417400.- 5.184223195295114 0.0
L1HS.1.chr4:98592435-98598463.- 1.3552260988037228 0.0
L1HS.1.chrX:11935296-11941314.- 4.675799369376466 0.0
Thanks for your time and suggestions!
Alfonso
For full_counts.txt each of the five transcript types: only, runon, passive (sense), passive (antisense), antisense are reported.
For l1hs_transcript_counts.txt and filter_L1HS_FPM.txt only proper transcription from L1HS elements start at the 5' UTR is reported.
It is possible that filter_L1HS_FPM.txt is empty because there were not enough read pairs (less than 100) to support normalized “proper” (i.e. “only” + “3prunon”) LINE-1 expression from the loci in the active subfamily (L1HS).
What is your model organism/disease context?
Hi,
Thank you for developing this tool for evaluating L1 expression. I'm encountering an issue while running L1EM from a conda environment. Here is my setup and the steps I followed:
Environment and Command
I activated the conda environment and executed the following commands:
Issue
The process starts but fails with the following error:
It appears that there is an issue reading the BAM file, as no reads were processed in STEP 1, which is reflected in the log:
Additional Information
As can be deduced from the name of the BAM file, reads were trimmed before the mapping with STAR.
Here is the output from
samtools flagstat
for the BAM file, in case it's useful for debugging:Any help or advice would be highly appreciated.
Thanks,
Alfonso