Closed lcabus-flomics closed 8 months ago
So the problem with this is that in the STAR output, the the read1 does not say that the read2 is its pair. Given this, there is no way for umi_tools to know that it has to output that read.
One can imagine that the STAR output might be preprocessed to take each pair of reads in turn and set them to be each other's pair. So something like:
import sys
import pysam
inbam = pysam.AlignmentFile("my_bam_file.bam")
outbam = pysam.AlignmentFile("output.bam", "w", template=inbam)
all_reads = inbam.fetch(until_eof=True)
try:
read1 = next(all_reads)
while True:
read2 = next(all_reads)
if not(read1.is_read1 and read2.is_read2 and read1.query_name==read2.query_name):
read1 = read2
continue
read1.next_reference_start = read2.reference_start
read1.next_reference_id = read2.reference_id
read2.next_reference_start = read1.reference_start
read2.next_reference_id = read1.reference_id
outbam.write(read1)
outbam.write(read2)
read1 = next(all_reads)
except StopIteration:
pass
Its a bit of a cludge, but it should work I think, as long as STAR always puts a read's intended pair next to it!
Thats the header - should be present in all bamfiles, although samtools
doesn't normally show it? Are you viewing this using head
? that probably means it is a sam file.
Try chaning the mode of the output file so:
outbam = pysam.AlignmentFile("output.bam", "w", template=inbam)
becomes
outbam = pysam.AlignmentFile("output.bam", "wb", template=inbam)
Yes, sorry, I have seen that it's a sam file, I have transformed it into a .bam using samtools and now I'm trying the umi_tools dedup to see if it works, I will get back with the results, thank you
It doesn't seem to work, when running RSEM after this, the error message is still the same, should I do some type of sorting of the results of STAR before running this script?
Can you run the BAM file through RSEM if you don't deduplicate?
On Thu, 25 Mar 2021 at 10:56, lcabus-flomics @.***> wrote:
It doesn't seem to work, when running RSEM after this, the error message is still the same, should I do some type of sorting of the results of STAR before running this script?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/CGATOxford/UMI-tools/issues/465#issuecomment-806553239, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABJELDU76T7PT7RJ6K442TDTFMJG5ANCNFSM4ZXRN2FA .
Yes, it runs it without any problem
I also have tried with the bam files when running STAR with single-end and, as expected, the deduplication and RSEM work fine. The problem is only the STAR with paired-end.
I can see what the problem is. It will need some further scripts to fix, and I think I can see a fix within UMI-tools itself. I'll see if I can get chance to have a go at implementing this at the weekend.
Okay, thank you very much!
I'm about to start working on this - I'm going to implement something into UMI-tools that should deal with it long term, but while I'm doing that, have you tried running the convert-sam-for-rsem
from RSEM? If you have, and that didn't work, I will code up a quick stop gap.
The answer to that is no convert-for-sam
doesn't work!
I have created a new script umi_tools prepare-for-sam
that should take the output from dedup
and produce a file compatible with RSEM.
Its on the {IS}_prepare-for-rsem
branch. Would you be able to grab that branch and install it and give it a try. Let me know how it goes.
Okay, thank you very much! As soon as I have the results I will post here
Hi! Regarding this issue, I have already produced the .bam file from the dedup
command, but when I try to call the script mentioned by @IanSudbery, which I call using umi_tools prepare-for-rsem sample_dedup_Aligned.toTranscriptome.out.bam
. I recieve the following message, which does not stop the execution but it does not produce anything else:
#UMI-tools version: 1.1.1
#output generated by prepare-for-rsem LC001_dedup_Aligned.toTranscriptome.out.bam
#job started at Wed Apr 14 10:13:54 2021 on flomics-ThinkPad-L580 -- c89235d8-1f03-4918-b2fb-f07649720fd6
#pid: 1576937, system: Linux 5.4.0-70-generic #78-Ubuntu SMP Fri Mar 19 13:29:52 UTC 2021 x86_64
#compresslevel : 6
#log2stderr : False
#loglevel : 1
#random_seed : None
#sam : False
#short_help : None
#stderr : <_io.TextIOWrapper name='<stderr>' mode='w' encoding='utf-8'>
#stdin : <_io.TextIOWrapper name='<stdin>' mode='r' encoding='utf-8'>
#stdlog : <_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>
#stdout : <_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>
#tags : UG,BX
#timeit_file : None
#timeit_header : None
#timeit_name : all
#tmpdir : None
I have made sure that I have the correct branch (I suppose if I did not have it it would not find the prepare-for-rsem.py
) and have all the dependencies installed. I also tried to run it in a virtual environment with more memory than my machine (32 GB and 16 GB) but it still becomes stalled.
Thank you very much!
Have you sorted by name first? The command requires that the input BAM file is name sorted, or at least collated (i.e. all reads with the same name together).
I guess I could add that to the script so that it would take input directly from dedup
.
HI! Thank you for your answer! I thought it only needed to be indexed but I did not sort it. I have already sorted it with samtools sort
and re-launched the command with the sorted bam some time ago but it's still "frozen" like before.
Thanks again!
It needs to be name sorted rather than position sorted (i.e. samtools sort -n
or samtools collate
). THis is kind of annoying because it has to be position sorted for dedup.
So the process would be:
position sort -> index -> dedup -> name sort/samtools collate -> prepare-for-rsem
prepare-for-rsem doesn't require the input to be indexed (indeed, I don't think its possible to index a name sorted file).
Hi!
Sorry for the delayed response! I am still having some troubles using the solutions proposed here. What I am doing is:
samtols sort -o out.bam in.sam
-> samtools index sorted.bam
-> umi_tools dedup --stdin=sorted.bam --log=dedup.log --stdout=dedup.out.bam --paired --multimapping-detection-method=NH --unpaired-reads=discard --chimeric-pairs=discard
-> samtools sort -n -o dedup_sorted.out.bam dedup.out.bam
-> umi_tools prepare-for-rsem dedup_sorted.out.bam
I am getting stuck on the prepare-for-rsem
step, how long should it take to complete? I think it freezes or does not produce an output.
Thanks in advance for all the help!
Sorry, Can you try umi_tools prepare-for-rsem -I dedup_sorted.out.bam --stdout=ready_for_rsem.out.bam
I get the following error when running umi_tools prepare-for-rsem -I POOLAM1-40.umi_dedup.sorted_by_name.bam --stdout ready_for_rsem.out.bam
# UMI-tools version: 1.1.1
# output generated by prepare-for-rsem -I POOLAM1-40.umi_dedup.sorted_by_name.bam --stdout ready_for_rsem.out.bam
# job started at Wed Jul 14 18:34:02 2021 on zeus.icbi.local -- b6246918-93a7-4678-bab6-638475ceb9bb
# pid: 29147, system: Linux 3.10.0-1160.11.1.el7.x86_64 #1 SMP Fri Dec 18 16:34:56 UTC 2020 x86_64
# compresslevel : 6
# log2stderr : False
# loglevel : 1
# random_seed : None
# sam : False
# short_help : None
# stderr : <_io.TextIOWrapper name='<stderr>' mode='w' encoding='utf-8'>
# stdin : <_io.TextIOWrapper name='POOLAM1-40.umi_dedup.sorted_by_name.bam' mode='r' encoding='UTF-8'>
# stdlog : <_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>
# stdout : <_io.TextIOWrapper name='ready_for_rsem.out.bam' mode='w' encoding='UTF-8'>
# tags : UG,BX
# timeit_file : None
# timeit_header : None
# timeit_name : all
# tmpdir : None
[E::idx_find_and_load] Could not retrieve index file for 'POOLAM1-40.umi_dedup.sorted_by_name.bam'
Traceback (most recent call last):
File "/data/scratch/sturm/conda/envs/test_salmon/bin/umi_tools", line 8, in <module>
sys.exit(main())
File "/data/scratch/sturm/conda/envs/test_salmon/lib/python3.8/site-packages/umi_tools/umi_tools.py", line 61, in main
module.main(sys.argv)
File "/data/scratch/sturm/conda/envs/test_salmon/lib/python3.8/site-packages/umi_tools/prepare-for-rsem.py", line 164, in main
mate = current_template[not read.is_read1][mate_key_primary][0]
IndexError: list index out of range
Actually, I don't want to run RSEM, but Salmon in alignment mode on a deduplicated BAM file. Unlike RSEM, Salmon runs through, but the results do not make sense (almost all genes are quantified with 0 reads). I was hoping that it would be the same underlying issue.
Its difficult to tell without access to the data file, but my first instinct would be that there is an unpaired read here - I need to have a careful look at the code, but I think prepare-for-rsem might assume that there is a read1 for every read2, and if there isn't, it gets upset. I'll check the code, but the absolute best thing would be if you could share a small example BAM that was causing the problem.
Hi @IanSudbery,
you can access the original and duplicated bamfiles here. They are aligned to the mm10 genome and gencode.vM25.primary_assembly.annotation.gtf
using STAR and the nf-core RNA-Seq pipeline.
There was some additional discussion about this with @drpatelh and @rob-p on the nf-core slack [1] [2].
Here's the summary:
According to @rob-p, this issue is likely the reason why Salmon fails to produce reasonable counts on my data. Here's the assumptions Salmon makes about the input files:
Salmon expects that for a paired-end fragment, the alignment records for all alignments of this fragment are consecutive in the file, and that the alignment for end2 is directly after the corresponding alignment for end1.
[Salmon] is aware of orphan alignments (where only one end aligns), it needs to account for this because e.g. implied fragment size can only be computed when you have a proper paired-end alignment, but it then still expects the unmapped record for the second end of the pair to remain in the file (just like RSEM).
When running samtools flagstat
on the umi-deduplicated BAM file, we see that the read2
s are missing.
(test_salmon) sturm@zeus [SSH] test_umitools % samtools flagstat POOLAM1-40.umi_dedup.transcriptome.sorted.bam
5768256 + 0 in total (QC-passed reads + QC-failed reads)
2660439 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
5768256 + 0 mapped (100.00% : N/A)
3107817 + 0 paired in sequencing
3107817 + 0 read1
0 + 0 read2
3107817 + 0 properly paired (100.00% : N/A)
3107817 + 0 with itself and mate mapped
0 + 0 singletons (0.00% : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
it then still expects the unmapped record for the second end of the pair to remain in the file (just like RSEM).
I bet this is the problem. We use mapping coordinates to find the mates. Unmapped reads are just dumped. We could output all unmapped reads, but then you'd have unmapped mates of reads that had not been selected after duplication. Its not entirely clear how to get around this: I'll have to have a think.
Yup, we kinda figured it had to do with some weirdness with the way reads are filtered from paired-end reads. Given that UMI-tools is traversing the name sorted BAM file would it be possible to have some sort of read buffer where if you encounter reads with the same name where one is flagged as a duplicate and the other is unmapped then dump both? That would be the expected behaviour right?
I also found it weird that all read2
are being dumped in the example above even though they appear to be mapped in the original BAM? Could it have something to do with secondary
/ supplementary
alignments?
Oh. I missed that - I misread your message as "some read2s are missing", and only glanced at the flagstats. Its definitely wrong that all read twos are being dumped. You are running dedup with --paired
? We ignore secondary/supplementary flags - we treat all reads irrespective of primary/secondary/supplementary status the same.
UMI-tools count/group/dedup traverses the position sorted BAM, not name sorted. Perhaps if the problem is unmapped read 2 mates of mapped read1s (and I'm not so sure any more), is to output all unmapped reads, and then alter prepare-for-rsem
to get rid of the excess unmapped mates.
Sorry I had a dig around in the workflow repo, but I'm not really that familier with NextFlow, and as best I can tell, whether its run with --paired or not will depend on the configuration.
I did notice that you are running with --output-stats
though, which we generally don't recommend, because its often very resource intensive.
Ah, pants!! No we aren't giving a --paired
flag to the dedup
command as you can see here 🤦🏽 Should just be a case of adding a couple of lines to the module on nf-core/modules like here. We should also probably remove --output-stats
as you suggested too. @grst fancy seeing if this is all a problem on our end? Sorry, I don't know how I missed the --paired
option when I added support to nf-core/rnaseq.
UMI-tools count/group/dedup traverses the position sorted BAM, not name sorted.
Yep, I misread the other comments above too! prepare-for-rsem
requires a name sorted BAM and not dedup
👍🏽
Yeah, I thought that it might be coming from $options.args
. There is still a potential theoretical problem with situations where read1 is alignmed but read2 isn't, so I'll see what I can think of for that.
No we aren't giving a --paired flag to the dedup command as you can see here 🤦🏽 Should just be a case of adding a couple of lines to the module on nf-core/modules like here. We should also probably remove --output-stats as you suggested too. @grst fancy seeing if this is all a problem on our end?
Sure thing, will update the module. Not sure, though, if I make it today, and then I'm on :palm_tree: until next Thursday.
Thanks! No worries! Maybe we should create a small issue on nf-core/rnaseq linking here so we are able to track it?
Hi! I'm sorry for my 6 month long silence, but I had to do a lot of things and we kind of put solving this issue in the low-priority pile. Now it arose again, so I re-traced all my steps and get stuck on the prepare-for-rsem
step still. Also, without doing it (if I try to do umi_tools dedup
-> rsem calculate-expression
, I run into a new error I did not find before (I am using other samples but the steps are the same), which reads:
The adjacent two lines do not represent the two mates of a paired-end read! (RSEM assumes the two mates of a paired-end read should be adjacent)
I know it's a bother reopening this issue, but I will be glad to test whatever to help fix it!
Thanks in advance :)
Hi! I'm sorry for my 6 month long silence, but I had to do a lot of things and we kind of put solving this issue in the low-priority pile. Now it arose again, so I re-traced all my steps and get stuck on the
prepare-for-rsem
step still. Also, without doing it (if I try to doumi_tools dedup
->rsem calculate-expression
, I run into a new error I did not find before (I am using other samples but the steps are the same), which reads:The adjacent two lines do not represent the two mates of a paired-end read! (RSEM assumes the two mates of a paired-end read should be adjacent)
I know it's a bother reopening this issue, but I will be glad to test whatever to help fix it! Thanks in advance :)
I am facing the same issue, and prepare-for-rsem fails to run. Ian, are there any news with prepare-for-rsem and the rsem compatibility?
Hi, Sorry, this got lost in the rush of the new teaching semester. I've really only got time for the highest priority stuff. I'll try to have another look at this before Christmas though, as I do seem to remember having an idea.
Just quickly @ctuni did you try running prepare-for-rsem
as umi-tools prepare-for-rsem -I dedup_sorted.out.bam --stdout=ready_for_rsem.out.bam
?
The error you are getting from rsem is exactly the reason we wrote prepare-for-rsem
, which I think works for most people. @avivdemorgan could you also let me know how prepare-for-rsem
fails to run? Is there an error message?
HI! Yes, I tried the prepare-for-rsem
command using the syntax you told some time ago. I am getting an error now in prepare-for-rsem
command which is the following:
umi_tools prepare-for-rsem -I step2.bam --stdout=step4.bam
# UMI-tools version: 1.1.1
# output generated by prepare-for-rsem -I step2.bam --stdout=step4.bam
# job started at Thu Dec 2 08:50:03 2021 on flomics-ThinkPad-L580 -- 52ccc656-0cde-4bef-b340-6013e746f8cf
# pid: 43390, system: Linux 5.4.0-91-generic #102-Ubuntu SMP Fri Nov 5 16:31:28 UTC 2021 x86_64
# compresslevel : 6
# log2stderr : False
# loglevel : 1
# random_seed : None
# sam : False
# short_help : None
# stderr : <_io.TextIOWrapper name='<stderr>' mode='w' encoding='utf-8'>
# stdin : <_io.TextIOWrapper name='step2.bam' mode='r' encoding='UTF-8'>
# stdlog : <_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>
# stdout : <_io.TextIOWrapper name='step4.bam' mode='w' encoding='UTF-8'>
# tags : UG,BX
# timeit_file : None
# timeit_header : None
# timeit_name : all
# tmpdir : None
Traceback (most recent call last):
File "/home/ctuni/anaconda3/bin/umi_tools", line 8, in <module>
sys.exit(main())
File "/home/ctuni/anaconda3/lib/python3.8/site-packages/umi_tools/umi_tools.py", line 61, in main
module.main(sys.argv)
File "/home/ctuni/anaconda3/lib/python3.8/site-packages/umi_tools/prepare-for-rsem.py", line 203, in main
outbam.write(key)
TypeError: Argument 'read' has incorrect type (expected pysam.libcalignedsegment.AlignedSegment, got tuple)
What I did was after umi_tools dedup
I built the index for the bam file and passed it to prepare-for-rsem
.
Sorry for the delay and thank you very much!
Hi Ian (and @ctuni),
Here is what I did:
1) sorted the deduplicated bam file with samtools sort and then indexed them
2) ran umi_tools
prepare-for-rsem -I
sample.Aligned.toTranscriptome.out.sorted.dedup.bam
--stdout=ready.bam The error message is written below.
3) NOTE: when sorting by name (samtools sort -n or samtools collate)
indices cannot be generated, and prepare-for-rsem fails: it reports a missing
index file, and then exits with the EXACT Traceback
error message as below.
And the error was:
***@***.***:~/runs/exp38/dedup$
umi_tools prepare-for-rsem -I
sample.Aligned.toTranscriptome.out.sorted.dedup.bam
--stdout=ready.bam
# UMI-tools version: 1.1.1
# output generated by prepare-for-rsem -I
sample.Aligned.toTranscriptome.out.sorted.dedup.bam
--stdout=ready.bam
# job started at Thu Dec 2 09:12:20 2021 on cogent --
1b2dfe8b-657d-4860-922f-c382a155b768
# pid: 11661, system: Linux 4.15.0-130-generic #134-Ubuntu SMP Tue
Jan 5 20:46:26 UTC 2021 x86_64
# compresslevel : 6
# log2stderr : False
# loglevel : 1
# random_seed : None
# sam : False
# short_help : None
# stderr : <_io.TextIOWrapper
name='<stderr>' mode='w' encoding='UTF-8'>
# stdin : <_io.TextIOWrapper
name='sample.Aligned.toTranscriptome.out.sorted.dedup.bam'
mode='r' encoding='UTF-8'>
# stdlog : <_io.TextIOWrapper
name='<stdout>' mode='w' encoding='UTF-8'>
# stdout : <_io.TextIOWrapper
name='ready.bam' mode='w' encoding='UTF-8'>
# tags : UG,BX
# timeit_file : None
# timeit_header : None
# timeit_name : all
# tmpdir : None
Traceback (most recent call last):
File "/home/user/.venvs/venv_umitools/bin/umi_tools", line 11,
in <module>
load_entry_point('umi-tools==1.1.1', 'console_scripts',
'umi_tools')()
File
"/home/user/.venvs/venv_umitools/lib/python3.6/site-packages/umi_tools/umi_tools.py",
line 61, in main
module.main(sys.argv)
File
"/home/user/.venvs/venv_umitools/lib/python3.6/site-packages/umi_tools/prepare-for-rsem.py",
line 203, in main
outbam.write(key)
TypeError: Argument 'read' has incorrect type (expected
pysam.libcalignedsegment.AlignedSegment, got tuple)
I appreciate your assistance in this matter.
Best,
Aviv.
--
Aviv De Morgan, Ph.D Head of Bioinformatics Pyxis Diagnostics High Tech Village Givat-Ram, POB 39158 Jerusalem 91391 Israel +972-2-6553333 @.***
Ah! Now that I can fix!
I've push a new commit to the {IS}_prepare-for-rsem branch. Would you mind pull it and seeing if it works?
Ian
body p { margin-bottom: 0cm; margin-top: 0pt; }
Thanks, Ian.
I will do it tomorrow, as soon as I can, and report back to you.
Thanks for your work fixing this!
Best,
Aviv.
On 04/12/2021 19:08, Ian Sudbery wrote:
Ah! Now that I can fix!
I've push a new commit to the {IS}_prepare-for-rsem
branch. Would you mind pull it and seeing if it works?
Ian
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications on the go with GitHub Mobile for iOS or Android.
[
{ @.": "http://schema.org", @.": "EmailMessage", "potentialAction": { @.": "ViewAction", "target": "https://github.com/CGATOxford/UMI-tools/issues/465#issuecomment-986059766", "url": "https://github.com/CGATOxford/UMI-tools/issues/465#issuecomment-986059766", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { @.": "Organization", "name": "GitHub", "url": "https://github.com" } } ]
--
Aviv De Morgan, Ph.D Head of Bioinformatics Pyxis Diagnostics High Tech Village Givat-Ram, POB 39158 Jerusalem 91391 Israel +972-2-6553333 @.***
body p { margin-bottom: 0cm; margin-top: 0pt; }
Hi Ian,
This morning I did:
$ git clone --branch
{IS}_prepare-for-rsem https://github.com/CGATOxford/UMI-tools/
cd, and installed:
$ pip install -r
requirements.txt
and still, it fails with same error on a deduplicated bam file,
after sorting (not sort
-n or collate),
and indexing.
Best,
Aviv.
On 04/12/2021 19:08, Ian Sudbery wrote:
Ah! Now that I can fix!
I've push a new commit to the {IS}_prepare-for-rsem
branch. Would you mind pull it and seeing if it works?
Ian
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications on the go with GitHub Mobile for iOS or Android.
[
{ @.": "http://schema.org", @.": "EmailMessage", "potentialAction": { @.": "ViewAction", "target": "https://github.com/CGATOxford/UMI-tools/issues/465#issuecomment-986059766", "url": "https://github.com/CGATOxford/UMI-tools/issues/465#issuecomment-986059766", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { @.": "Organization", "name": "GitHub", "url": "https://github.com" } } ]
--
Aviv De Morgan, Ph.D Head of Bioinformatics Pyxis Diagnostics High Tech Village Givat-Ram, POB 39158 Jerusalem 91391 Israel +972-2-6553333 @.***
By "it fails with the same error", do you mean RSEM fails with the same error, or prepare-for-rsem fails with the same error?
prepare-for-rsem can't possibly fail with exactly the same error if the new version has installed correctly, because the line causing the error is not longer the same. This would suggest that the installation hadn't worked.
You can try "python setup.py install"
Can you also confirm that you are doing: position sort -> index -> dedup -> name sort/samtools collate -> prepare-for-rsem?
On Sun, 5 Dec 2021 at 10:04, avivdemorgan @.***> wrote:
body p { margin-bottom: 0cm; margin-top: 0pt; }
Hi Ian,
This morning I did: $ git clone --branch {IS}_prepare-for-rsem https://github.com/CGATOxford/UMI-tools/
cd, and installed: $ pip install -r requirements.txt and still, it fails with same error on a deduplicated bam file, after sorting (not sort -n or collate), and indexing.
Best, Aviv.
On 04/12/2021 19:08, Ian Sudbery wrote:
Ah! Now that I can fix! I've push a new commit to the {IS}_prepare-for-rsem branch. Would you mind pull it and seeing if it works? Ian — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.
[ { @.": "http://schema.org", @.": "EmailMessage", "potentialAction": { @.": "ViewAction", "target": " https://github.com/CGATOxford/UMI-tools/issues/465#issuecomment-986059766 ", "url": " https://github.com/CGATOxford/UMI-tools/issues/465#issuecomment-986059766 ", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { @.": "Organization", "name": "GitHub", "url": "https://github.com" } } ]
-- Aviv De Morgan, Ph.D Head of Bioinformatics Pyxis Diagnostics High Tech Village Givat-Ram, POB 39158 Jerusalem 91391 Israel +972-2-6553333 @.***
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/CGATOxford/UMI-tools/issues/465#issuecomment-986200508, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABJELDQZZCAJFJGFOBBUAIDUPM2I7ANCNFSM4ZXRN2FA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
body p { margin-bottom: 0cm; margin-top: 0pt; }
Hi Ian,
Thanks for the e-mail.
I will test it again today, perhaps I did something wrong.
Best,
Aviv.
On 07/12/2021 0:54, Ian Sudbery wrote:
By "it fails with the same error", do you mean RSEM fails with the
same
error, or prepare-for-rsem fails with the same error?
prepare-for-rsem can't possibly fail with exactly the same error
if the new
version has installed correctly, because the line causing the
error is not
longer the same. This would suggest that the installation hadn't
worked.
You can try
"python setup.py install"
Can you also confirm that you are doing:
position sort -> index -> dedup -> name sort/samtools
collate ->
prepare-for-rsem?
On Sun, 5 Dec 2021 at 10:04, avivdemorgan ***@***.***> wrote:
>
>
> body p { margin-bottom: 0cm; margin-top: 0pt; }
>
>
> Hi Ian,
>
> This morning I did:
> $ git clone --branch
> {IS}_prepare-for-rsem
https://github.com/CGATOxford/UMI-tools/
>
> cd, and installed:
> $ pip install -r
> requirements.txt
> and still, it fails with same error on a deduplicated bam
file,
> after sorting (not sort
> -n or collate),
> and indexing.
>
> Best,
> Aviv.
>
> On 04/12/2021 19:08, Ian Sudbery wrote:
>
>
>
> Ah! Now that I can fix!
> I've push a new commit to the {IS}_prepare-for-rsem
> branch. Would you mind pull it and seeing if it works?
> Ian
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub, or
unsubscribe.
> Triage notifications on the go with GitHub Mobile for iOS or
Android.
>
> [
> {
> ***@***.***": "http://schema.org",
> ***@***.***": "EmailMessage",
> "potentialAction": {
> ***@***.***": "ViewAction",
> "target": "
>
https://github.com/CGATOxford/UMI-tools/issues/465#issuecomment-986059766
", "url": "
https://github.com/CGATOxford/UMI-tools/issues/465#issuecomment-986059766 ", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { @.***": "Organization", "name": "GitHub", "url": "https://github.com" } } ]
-- Aviv De Morgan, Ph.D Head of Bioinformatics Pyxis Diagnostics High Tech Village Givat-Ram, POB 39158 Jerusalem 91391 Israel +972-2-6553333 @.***
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub
https://github.com/CGATOxford/UMI-tools/issues/465#issuecomment-986200508, or unsubscribe
https://github.com/notifications/unsubscribe-auth/ABJELDQZZCAJFJGFOBBUAIDUPM2I7ANCNFSM4ZXRN2FA . Triage notifications on the go with GitHub Mobile for iOS
https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.
[
{ @.": "http://schema.org", @.": "EmailMessage", "potentialAction": { @.": "ViewAction", "target": "https://github.com/CGATOxford/UMI-tools/issues/465#issuecomment-987331293", "url": "https://github.com/CGATOxford/UMI-tools/issues/465#issuecomment-987331293", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { @.": "Organization", "name": "GitHub", "url": "https://github.com" } } ]
--
Aviv De Morgan, Ph.D Head of Bioinformatics Pyxis Diagnostics High Tech Village Givat-Ram, POB 39158 Jerusalem 91391 Israel +972-2-6553333 @.***
body p { margin-bottom: 0cm; margin-top: 0pt; }
Hi Ian,
I followed the steps you suggested (indeed, the previous
installation was incorrect), and the error output was:
(venv_umitools)
***@***.***:~/Desktop/umitools/dedup$ umi_tools prepare-for-rsem -I
sample_S43_R1_001.Aligned.toTranscriptome.out.sorted.dedup.sorted.bam --stdout=rsem_ready.bam
# output generated by prepare-for-rsem -I
sample_S43_R1_001.Aligned.toTranscriptome.out.sorted.dedup.sorted.bam
--stdout=rsem_ready.bam
# job started at Tue Dec 7 10:57:22 2021 on user --
258f1d18-b6ea-47b0-9cf7-7b1957e60d57
# pid: 696301, system: Linux 5.4.0-90-generic #101-Ubuntu SMP Fri
Oct 15 20:00:55 UTC 2021 x86_64
# compresslevel : 6
# log2stderr : False
# loglevel : 1
# random_seed : None
# sam : False
# short_help : None
# stderr : <_io.TextIOWrapper
name='<stderr>' mode='w' encoding='utf-8'>
# stdin : <_io.TextIOWrapper
name='sample_S43_R1_001.Aligned.toTranscriptome.out.sorted.dedup.sorted.bam' mode='r' encoding='UTF-8'>
name='<stdout>' mode='w' encoding='utf-8'>
# stdout : <_io.TextIOWrapper
name='rsem_ready.bam' mode='w' encoding='UTF-8'>
# tags : UG,BX
# timeit_file : None
# timeit_header : None
# timeit_name : all
# tmpdir : None
[E::idx_find_and_load] Could not retrieve index file for
'sample_S43_R1_001.Aligned.toTranscriptome.out.sorted.dedup.sorted.bam'
Traceback (most recent call last):
File "/home/user/venv_umitools/bin/umi_tools", line 11, in
<module>
load_entry_point('umi-tools==1.1.2', 'console_scripts',
'umi_tools')()
File
"/home/user/venv_umitools/lib/python3.8/site-packages/umi_tools-1.1.2-py3.8-linux-x86_64.egg/umi_tools/umi_tools.py", line 61, in main module.main(sys.argv) File "/home/user/venv_umitools/lib/python3.8/site-packages/umi_tools-1.1.2-py3.8-linux-x86_64.egg/umi_tools/prepare-for-rsem.py", line 164, in main mate = current_template[not read.is_read1][mate_key_primary][0] IndexError: list index out of range
Awaiting your input,
Sincerely,
Aviv.
On 07/12/2021 0:54, Ian Sudbery wrote:
By "it fails with the same error", do you mean RSEM fails with the
same
error, or prepare-for-rsem fails with the same error?
prepare-for-rsem can't possibly fail with exactly the same error
if the new
version has installed correctly, because the line causing the
error is not
longer the same. This would suggest that the installation hadn't
worked.
You can try
"python setup.py install"
Can you also confirm that you are doing:
position sort -> index -> dedup -> name sort/samtools
collate ->
prepare-for-rsem?
On Sun, 5 Dec 2021 at 10:04, avivdemorgan ***@***.***> wrote:
>
>
> body p { margin-bottom: 0cm; margin-top: 0pt; }
>
>
> Hi Ian,
>
> This morning I did:
> $ git clone --branch
> {IS}_prepare-for-rsem
https://github.com/CGATOxford/UMI-tools/
>
> cd, and installed:
> $ pip install -r
> requirements.txt
> and still, it fails with same error on a deduplicated bam
file,
> after sorting (not sort
> -n or collate),
> and indexing.
>
> Best,
> Aviv.
>
> On 04/12/2021 19:08, Ian Sudbery wrote:
>
>
>
> Ah! Now that I can fix!
> I've push a new commit to the {IS}_prepare-for-rsem
> branch. Would you mind pull it and seeing if it works?
> Ian
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub, or
unsubscribe.
> Triage notifications on the go with GitHub Mobile for iOS or
Android.
>
> [
> {
> ***@***.***": "http://schema.org",
> ***@***.***": "EmailMessage",
> "potentialAction": {
> ***@***.***": "ViewAction",
> "target": "
>
https://github.com/CGATOxford/UMI-tools/issues/465#issuecomment-986059766
", "url": "
https://github.com/CGATOxford/UMI-tools/issues/465#issuecomment-986059766 ", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { @.***": "Organization", "name": "GitHub", "url": "https://github.com" } } ]
-- Aviv De Morgan, Ph.D Head of Bioinformatics Pyxis Diagnostics High Tech Village Givat-Ram, POB 39158 Jerusalem 91391 Israel +972-2-6553333 @.***
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub
https://github.com/CGATOxford/UMI-tools/issues/465#issuecomment-986200508, or unsubscribe
https://github.com/notifications/unsubscribe-auth/ABJELDQZZCAJFJGFOBBUAIDUPM2I7ANCNFSM4ZXRN2FA . Triage notifications on the go with GitHub Mobile for iOS
https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.
[
{ @.": "http://schema.org", @.": "EmailMessage", "potentialAction": { @.": "ViewAction", "target": "https://github.com/CGATOxford/UMI-tools/issues/465#issuecomment-987331293", "url": "https://github.com/CGATOxford/UMI-tools/issues/465#issuecomment-987331293", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { @.": "Organization", "name": "GitHub", "url": "https://github.com" } } ]
--
Aviv De Morgan, Ph.D Head of Bioinformatics Pyxis Diagnostics High Tech Village Givat-Ram, POB 39158 Jerusalem 91391 Israel +972-2-6553333 @.***
Hello!
I have been able to try the new version of prepare for RSEM branch and indeed, the previous error does not appear more :)
But some others do. To recap a little, I obtained the aligned to transcriptome bam files with STAR and I position sorted them, created the index file, used dedup
command, name sorted them, and then ran umi_tools prepare-for-rsem
. I also tried to run the prepare-for-rsem
command with the deduped file directly, without name sorting it so I could create an index and see what happens. The error I obtained using prepare-for-rsem
with the name sorted file is the following:
umi_tools prepare-for-rsem -I SRR9113885_dedup_sorted.out.bam --stdout=ready_for_rsem.out.bam
# UMI-tools version: 1.1.2
# output generated by prepare-for-rsem -I SRR9113885_dedup_sorted.out.bam --stdout=ready_for_rsem.out.bam
# job started at Thu Dec 9 12:22:41 2021 on flomics-ThinkPad-L580 -- ddae99d7-c386-42a5-84d7-8f7a78caf60d
# pid: 340867, system: Linux 5.4.0-91-generic #102-Ubuntu SMP Fri Nov 5 16:31:28 UTC 2021 x86_64
# compresslevel : 6
# log2stderr : False
# loglevel : 1
# random_seed : None
# sam : False
# short_help : None
# stderr : <_io.TextIOWrapper name='<stderr>' mode='w' encoding='utf-8'>
# stdin : <_io.TextIOWrapper name='SRR9113885_dedup_sorted.out.bam' mode='r' encoding='UTF-8'>
# stdlog : <_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>
# stdout : <_io.TextIOWrapper name='ready_for_rsem.out.bam' mode='w' encoding='UTF-8'>
# tags : UG,BX
# timeit_file : None
# timeit_header : None
# timeit_name : all
# tmpdir : None
[E::idx_find_and_load] Could not retrieve index file for 'SRR9113885_dedup_sorted.out.bam'
Traceback (most recent call last):
File "/home/ctuni/anaconda3/bin/umi_tools", line 8, in <module>
sys.exit(main())
File "/home/ctuni/anaconda3/lib/python3.8/site-packages/umi_tools/umi_tools.py", line 61, in main
module.main(sys.argv)
File "/home/ctuni/anaconda3/lib/python3.8/site-packages/umi_tools/prepare-for-rsem.py", line 164, in main
mate = current_template[not read.is_read1][mate_key_primary][0]
IndexError: list index out of range
I tried using the deduped bam that was not name sorted because of the message regarding not being able to retrieve the index file. I created the index file of the deduped bam and used prepare- for-rsem
and the message about a missing index did not show, but I think the error was the same:
umi_tools prepare-for-rsem -I SRR9113885_dedup.out.bam --stdout=ready_for_rsem.out.bam
# UMI-tools version: 1.1.2
# output generated by prepare-for-rsem -I SRR9113885_dedup.out.bam --stdout=ready_for_rsem.out.bam
# job started at Thu Dec 9 12:23:45 2021 on flomics-ThinkPad-L580 -- c632d846-0c10-4602-90d4-23cd6ca6f494
# pid: 340936, system: Linux 5.4.0-91-generic #102-Ubuntu SMP Fri Nov 5 16:31:28 UTC 2021 x86_64
# compresslevel : 6
# log2stderr : False
# loglevel : 1
# random_seed : None
# sam : False
# short_help : None
# stderr : <_io.TextIOWrapper name='<stderr>' mode='w' encoding='utf-8'>
# stdin : <_io.TextIOWrapper name='SRR9113885_dedup.out.bam' mode='r' encoding='UTF-8'>
# stdlog : <_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>
# stdout : <_io.TextIOWrapper name='ready_for_rsem.out.bam' mode='w' encoding='UTF-8'>
# tags : UG,BX
# timeit_file : None
# timeit_header : None
# timeit_name : all
# tmpdir : None
Traceback (most recent call last):
File "/home/ctuni/anaconda3/bin/umi_tools", line 8, in <module>
sys.exit(main())
File "/home/ctuni/anaconda3/lib/python3.8/site-packages/umi_tools/umi_tools.py", line 61, in main
module.main(sys.argv)
File "/home/ctuni/anaconda3/lib/python3.8/site-packages/umi_tools/prepare-for-rsem.py", line 164, in main
mate = current_template[not read.is_read1][mate_key_primary][0]
IndexError: list index out of range
I think I may be missing something there, so I apologize in advance if I skipped a necessary step to make it work. Thank you very much for the patience and help!
body p { margin-bottom: 0cm; margin-top: 0pt; }
Hi Cristina,
Thanks for your e-mail.
I have the same error message, and I e-mailed Ian about this.
I hope Ian will resolve this sometime soon.
Best,
Aviv.
On 09/12/2021 13:30, Cristina Tuñí i
Domínguez wrote:
Hello!
I have been able to try the new version of prepare
for RSEM branch and indeed, the previous error does not appear
more :)
But some others do. To recap a little, I obtained
the aligned to transcriptome bam files with STAR and I position
sorted them, created the index file, used dedup
command, name sorted them, and then ran umi_tools
prepare-for-rsem. I also tried to run the prepare-for-rsem
command with the deduped file directly, without name sorting it
so I could create an index and see what happens. The error I
obtained using prepare-for-rsem with the name
sorted file is the following:
umi_tools prepare-for-rsem -I SRR9113885_dedup_sorted.out.bam --stdout=ready_for_rsem.out.bam
[E::idx_find_and_load] Could not retrieve index file for 'SRR9113885_dedup_sorted.out.bam'
Traceback (most recent call last):
File "/home/ctuni/anaconda3/bin/umi_tools", line 8, in
I tried using the deduped bam that was not name
sorted because of the message regarding not being able to
retrieve the index file. I created the index file of the deduped
bam and used prepare- for-rsem and the message
about a missing index did not show, but I think the error was
the same:
umi_tools prepare-for-rsem -I SRR9113885_dedup.out.bam --stdout=ready_for_rsem.out.bam
Traceback (most recent call last):
File "/home/ctuni/anaconda3/bin/umi_tools", line 8, in
I think I may be missing something there, so I
apologize in advance if I skipped a necessary step to make it
work. Thank you very much for the patience and help!
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications on the go with GitHub Mobile for iOS or Android.
[
{ @.": "http://schema.org", @.": "EmailMessage", "potentialAction": { @.": "ViewAction", "target": "https://github.com/CGATOxford/UMI-tools/issues/465#issuecomment-989767559", "url": "https://github.com/CGATOxford/UMI-tools/issues/465#issuecomment-989767559", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { @.": "Organization", "name": "GitHub", "url": "https://github.com" } } ]
--
Aviv De Morgan, Ph.D Head of Bioinformatics Pyxis Diagnostics High Tech Village Givat-Ram, POB 39158 Jerusalem 91391 Israel +972-2-6553333 @.***
I've pushed another update. It runs with out error on my testing file, but it might be helpful to have a better test file to work with if one of you would be willing to share a BAM or a subset of a BAM with me.
body p { margin-bottom: 0cm; margin-top: 0pt; }
Hi Ian,
Many thanks for your prompt answer!
Sure, I will share some BAM files with you, using Dropbox.
I will do it later on today.
And I am most grateful for your assistance in this issue.
Best wishes,
Aviv.
On 14/12/2021 14:13, Ian Sudbery wrote:
I've pushed another update. It runs with out error
on my testing file, but it might be helpful to have a better
test file to work with if one of you would be willing to share a
BAM or a subset of a BAM with me.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications on the go with GitHub Mobile for iOS or Android.
[
{ @.": "http://schema.org", @.": "EmailMessage", "potentialAction": { @.": "ViewAction", "target": "https://github.com/CGATOxford/UMI-tools/issues/465#issuecomment-993480322", "url": "https://github.com/CGATOxford/UMI-tools/issues/465#issuecomment-993480322", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { @.": "Organization", "name": "GitHub", "url": "https://github.com" } } ]
--
Aviv De Morgan, Ph.D Head of Bioinformatics Pyxis Diagnostics High Tech Village Givat-Ram, POB 39158 Jerusalem 91391 Israel +972-2-6553333 @.***
Anyone try the latest PR out to see if it works?
Hello! Thanks for the follow up! I can share some BAM files and also test the new PR tomorrow. Thank you very much and sorry for the delay :)
Okay! So I have been able to test the new version of ready-for-rsem
command and it works correctly it seems! The output is the following:
umi_tools prepare-for-rsem -I SRR9113885_dedup_sorted.out.bam --stdout=ready_for_rsem.out.bam
# UMI-tools version: 1.1.2
# output generated by prepare-for-rsem -I SRR9113885_dedup_sorted.out.bam --stdout=ready_for_rsem.out.bam
# job started at Fri Dec 17 16:29:28 2021 on ip-172-31-20-55 -- 308eea8d-1f69-413c-bc58-169b39fd0c7c
# pid: 9532, system: Linux 5.11.0-1022-aws #23~20.04.1-Ubuntu SMP Mon Nov 15 14:03:19 UTC 2021 x86_64
# compresslevel : 6
# log2stderr : False
# loglevel : 1
# random_seed : None
# sam : False
# short_help : None
# stderr : <_io.TextIOWrapper name='<stderr>' mode='w' encoding='utf-8'>
# stdin : <_io.TextIOWrapper name='SRR9113885_dedup_sorted.out.bam' mode='r' encoding='UTF-8'>
# stdlog : <_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>
# stdout : <_io.TextIOWrapper name='ready_for_rsem.out.bam' mode='w' encoding='UTF-8'>
# tags : UG,BX
# timeit_file : None
# timeit_header : None
# timeit_name : all
# tmpdir : None
[E::idx_find_and_load] Could not retrieve index file for 'SRR9113885_dedup_sorted.out.bam'
2021-12-17 16:29:38,323 WARNING Alignment SRR9113885.8167_CCTCATGGTC 83 ENST00000417570 211 has no mate -- skipped
2021-12-17 16:29:46,455 WARNING Alignment SRR9113885.189982_AGTTAGATGT 83 ENST00000652551 2697 has no mate -- skipped
2021-12-17 16:29:46,456 WARNING Alignment SRR9113885.189984_GTCTCGATGT 83 ENST00000480246 1934 has no mate -- skipped
2021-12-17 16:29:46,456 WARNING Alignment SRR9113885.189986_TGATGTCTAA 83 ENST00000652551 2320 has no mate -- skipped
2021-12-17 16:29:50,418 WARNING Alignment SRR9113885.232824_TGTCAAGGGC 83 ENST00000536769 3536 has no mate -- skipped
2021-12-17 16:29:50,419 WARNING Alignment SRR9113885.232826_TGCAGGGTGG 83 ENST00000538617 1134 has no mate -- skipped
2021-12-17 16:29:57,019 WARNING Alignment SRR9113885.358443_GCCAGCTGTT 83 ENST00000536769 3250 has no mate -- skipped
2021-12-17 16:30:03,023 WARNING Alignment SRR9113885.505059_TGCCAGTGAG 83 ENST00000339647 1846 has no mate -- skipped
2021-12-17 16:30:08,450 WARNING Alignment SRR9113885.598608_GGGTCTTGGT 83 ENST00000619423 7081 has no mate -- skipped
2021-12-17 16:30:11,483 WARNING Alignment SRR9113885.640196_TGGGACTTTT 83 ENST00000583866 4197 has no mate -- skipped
2021-12-17 16:30:14,593 WARNING Alignment SRR9113885.736786_GTGTGTGGTG 83 ENST00000567736 2148 has no mate -- skipped
2021-12-17 16:30:19,006 WARNING Alignment SRR9113885.821797_TGGTACTTTT 83 ENST00000617010 6393 has no mate -- skipped
2021-12-17 16:30:25,211 WARNING Alignment SRR9113885.969358_GTGGGGTGCG 83 ENST00000567736 1821 has no mate -- skipped
2021-12-17 16:30:47,795 WARNING Alignment SRR9113885.1108129_TTCTGGATGT 83 ENST00000339647 1777 has no mate -- skipped
2021-12-17 16:30:47,795 WARNING Alignment SRR9113885.1108130_TTCTGGATGT 83 ENST00000536769 3059 has no mate -- skipped
2021-12-17 16:30:47,796 WARNING Alignment SRR9113885.1108131_TCTGGATGTT 83 ENST00000339647 1546 has no mate -- skipped
2021-12-17 16:30:47,797 WARNING Alignment SRR9113885.1108134_GGATGTTGTA 83 ENST00000536769 3281 has no mate -- skipped
2021-12-17 16:30:47,799 WARNING Alignment SRR9113885.1108138_TGTCCATCTT 83 ENST00000339647 1748 has no mate -- skipped
2021-12-17 16:30:50,814 WARNING Alignment SRR9113885.1166981_TGTTGTAGTC 83 ENST00000339647 859 has no mate -- skipped
2021-12-17 16:31:05,304 WARNING Alignment SRR9113885.1451517_GCTCCCTTAT 83 ENST00000568624 2465 has no mate -- skipped
2021-12-17 16:31:06,167 WARNING Alignment SRR9113885.1473660_GTCCACTTTT 83 ENST00000378024 13068 has no mate -- skipped
2021-12-17 16:31:06,167 WARNING Alignment SRR9113885.1473661_GTCCACTTTT 83 ENST00000378024 13068 has no mate -- skipped
2021-12-17 16:31:43,303 WARNING Alignment SRR9113885.2203956_TGTGTGGGGT 83 ENST00000567736 1941 has no mate -- skipped
2021-12-17 16:32:26,064 WARNING Alignment SRR9113885.3073150_TAGGGGTCTG 83 ENST00000567736 2259 has no mate -- skipped
2021-12-17 16:32:26,588 WARNING Alignment SRR9113885.3083739_CTTGGGTCTT 83 ENST00000536769 3191 has no mate -- skipped
2021-12-17 16:32:47,692 WARNING Alignment SRR9113885.3431410_GCCTTGACAT 83 ENST00000536769 3184 has no mate -- skipped
2021-12-17 16:32:47,693 WARNING Alignment SRR9113885.3431411_GACATTCTCA 83 ENST00000536769 3175 has no mate -- skipped
2021-12-17 16:32:47,693 WARNING Alignment SRR9113885.3431413_AGCTCCACTT 83 ENST00000339647 1644 has no mate -- skipped
2021-12-17 16:32:53,268 WARNING Alignment SRR9113885.3511521_CTGCCCACGA 83 ENST00000513405 225 has no mate -- skipped
2021-12-17 16:32:53,874 WARNING Alignment SRR9113885.3523928_GTCCAGCTGT 83 ENST00000339647 1968 has no mate -- skipped
2021-12-17 16:32:57,301 INFO Total pairs output: 2734136, Pairs skipped - no mates: 30, Pairs skipped - not read1 or 2: 0
# job finished in 208 seconds at Fri Dec 17 16:32:57 2021 -- 208.75 3.53 0.00 0.00 -- 308eea8d-1f69-413c-bc58-169b39fd0c7c
I find those warning messages confusing because I made sure to remove singleton reads, but anyway, I ended with the ready-for-rsem.out.bam
file i wanted!
I launched then rsem-calculate-expression -paired-end --num-threads 15 --temporary-folder tmp/ --alignments ready_for_rsem.out.bam RSEM/GRCh38_ref final
and redirected the standard output to a log file because it was really long, but the end message was
rsem-run-em: RefSeq.h:85: int RefSeq::get_id(int, int) const: Assertion `pos >= 0 && pos < totLen' failed.
To gain a little bit of insight on the difference of the files before and after umi_tools prepare-for-rsem
, here's the output of samtools flagstat
of both input and output files of the command:
$samtools flagstat ready_for_rsem.out.bam
5468272 + 0 in total (QC-passed reads + QC-failed reads)
2086706 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
5468272 + 0 mapped (100.00% : N/A)
3381566 + 0 paired in sequencing
1690783 + 0 read1
1690783 + 0 read2
3381566 + 0 properly paired (100.00% : N/A)
3381566 + 0 with itself and mate mapped
0 + 0 singletons (0.00% : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
$ samtools flagstat SRR9113885_dedup_sorted.out.bam
3707176 + 0 in total (QC-passed reads + QC-failed reads)
1870528 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
3707176 + 0 mapped (100.00% : N/A)
1836648 + 0 paired in sequencing
918339 + 0 read1
918309 + 0 read2
1836648 + 0 properly paired (100.00% : N/A)
1836648 + 0 with itself and mate mapped
0 + 0 singletons (0.00% : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
Please find attached in this Google Drive folder the following files:
SRR9113885_Aligned.toTranscriptome.out.bam
: The initial BAM file as it is outputed by STARSRR9113885_dedup_sorted.out.bam
: The processed BAM file after sorting, indexing, deduplicating, and name sorting, as the input for prepare-for-rsem
ready_for_rsem.out.bam
: The BAM file that is outputed from prepare-for-rsem
and given to rsem calculate-expression
rsem.log
: The outputed messages RSEM givesAs always, thank you very much for your time and help! :)
body p { margin-bottom: 0cm; margin-top: 0pt; }
Hi Ian,
Thanks for the e-mail.
Which branch to pull?
Best,
Aviv.
On 16/12/2021 17:42, Ian Sudbery wrote:
Anyone try the latest PR out to see if it works?
—
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications on the go with GitHub Mobile for iOS or Android.
You are receiving this because you were mentioned.Message
ID: ***@***.***>
[
{ @.": "http://schema.org", @.": "EmailMessage", "potentialAction": { @.": "ViewAction", "target": "https://github.com/CGATOxford/UMI-tools/issues/465#issuecomment-995936197", "url": "https://github.com/CGATOxford/UMI-tools/issues/465#issuecomment-995936197", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { @.": "Organization", "name": "GitHub", "url": "https://github.com" } } ]
--
Aviv De Morgan, Ph.D Head of Bioinformatics Pyxis Diagnostics High Tech Village Givat-Ram, POB 39158 Jerusalem 91391 Israel +972-2-6553333 @.***
body p { margin-bottom: 0cm; margin-top: 0pt; }
Hi Ian,
I pulled branch {IS}_prepare_for_rsem
and installed.
This time, prepare-for-rsem
completed without errors (last lines only):
.
.
2021-12-19 12:29:57,111 WARNING Alignment
A01032:183:HKT5GDRXY:2:2278:20573:31015_TTATCGGG 99
ENST00000581296.2 3251 has no mate -- skipped
2021-12-19 12:29:57,752 WARNING Alignment
A01032:183:HKT5GDRXY:2:2278:29460:30342_CCTCTCCG 99
ENST00000283195.11 4960 has no mate -- skipped
021-12-19 12:29:58,085 INFO Total pairs output: 2354942, Pairs
skipped - no mates: 415, Pairs skipped - not read1 or 2: 0
# job finished in 538 seconds at Sun Dec 19 12:29:58 2021 --
534.52 3.09 0.00 0.00 -- 77b19c92-7519-47f6-bdef-698c43c3a912
And them, rsem-calculate-expression, gave this:
(venv_umitools)
***@***.***:~/Desktop/umitools/dedup$
rsem-calculate-expression -p 2 --no-bam-output --paired-end
--alignments rsem_ready.bam
/home/aviv/Desktop/umitools/rsem_ref_gencode_v38/rsem_ref_gencode
counts
rsem-parse-alignments
/home/aviv/Desktop/umitools/rsem_ref_gencode_v38/rsem_ref_gencode
counts.temp/counts counts.stat/counts rsem_ready.bam 3 -tag XM
Warning: Detected a read pair whose two mates have different
names--A01032:183:HKT5GDRXY:1:2101:1181:27618_GAGACTGG and
A01032:183:HKT5GDRXY:2:2252:1651:15812_GTCGCTAA!
Warning: Detected a read pair whose two mates have different
names--A01032:183:HKT5GDRXY:1:2101:1181:27618_GAGACTGG and
A01032:183:HKT5GDRXY:2:2252:1651:15812_GTCGCTAA!
Warning: Detected a read pair whose two mates have different
names--A01032:183:HKT5GDRXY:1:2101:1181:27618_GAGACTGG and
A01032:183:HKT5GDRXY:2:2228:29812:27258_CTAGTATT!
Warning: Detected a read pair whose two mates have different
names--A01032:183:HKT5GDRXY:2:2252:1651:15812_GTCGCTAA and
A01032:183:HKT5GDRXY:1:2101:1181:27618_GAGACTGG!
Warning: Detected a read pair whose two mates have different
names--A01032:183:HKT5GDRXY:2:2252:1651:15812_GTCGCTAA and
A01032:183:HKT5GDRXY:1:2101:1181:27618_GAGACTGG!
Warning: Detected a read pair whose two mates have different
names--A01032:183:HKT5GDRXY:2:2228:29812:27258_CTAGTATT and
A01032:183:HKT5GDRXY:1:2101:1181:27618_GAGACTGG!
Warning: Detected a read pair whose two mates have different
names--A01032:183:HKT5GDRXY:1:2101:1805:19460_CTCATGGA and
A01032:183:HKT5GDRXY:1:2153:16749:21042_CCCCTAGA!
Warning: Detected a read pair whose two mates have different
names--A01032:183:HKT5GDRXY:1:2153:16749:21042_CCCCTAGA and
A01032:183:HKT5GDRXY:1:2101:1805:19460_CTCATGGA!
Warning: Detected a read pair whose two mates have different
names--A01032:183:HKT5GDRXY:1:2101:1805:24189_TTCTCGCA and
A01032:183:HKT5GDRXY:2:2109:17951:22874_AGATGGAT!
Paired-end read A01032:183:HKT5GDRXY:1:2101:1805:24189_TTCTCGCA
has alignments with inconsistent mate lengths!
"rsem-parse-alignments
/home/aviv/Desktop/umitools/rsem_ref_gencode_v38/rsem_ref_gencode
counts.temp/counts counts.stat/counts rsem_ready.bam 3 -tag XM"
failed! Plase check if you provide correct parameters/options for
the pipeline!
I think this error is equivalent to yours, and differs in syntax,
since we probably sorted the deduplicated bams differently, i.e., samtools sort -n vs.
samtools collate.
I hope this is helpful.
Best wishes,
Aviv.
On 17/12/2021 18:28, Cristina Tuñí i
Domínguez wrote:
Okay! So I have been able to test the new version of
ready-for-rsem command and it works correctly it
seems! The output is the following:
umi_tools prepare-for-rsem -I SRR9113885_dedup_sorted.out.bam --stdout=ready_for_rsem.out.bam
[E::idx_find_and_load] Could not retrieve index file for 'SRR9113885_dedup_sorted.out.bam' 2021-12-17 16:29:38,323 WARNING Alignment SRR9113885.8167_CCTCATGGTC 83 ENST00000417570 211 has no mate -- skipped 2021-12-17 16:29:46,455 WARNING Alignment SRR9113885.189982_AGTTAGATGT 83 ENST00000652551 2697 has no mate -- skipped 2021-12-17 16:29:46,456 WARNING Alignment SRR9113885.189984_GTCTCGATGT 83 ENST00000480246 1934 has no mate -- skipped 2021-12-17 16:29:46,456 WARNING Alignment SRR9113885.189986_TGATGTCTAA 83 ENST00000652551 2320 has no mate -- skipped 2021-12-17 16:29:50,418 WARNING Alignment SRR9113885.232824_TGTCAAGGGC 83 ENST00000536769 3536 has no mate -- skipped 2021-12-17 16:29:50,419 WARNING Alignment SRR9113885.232826_TGCAGGGTGG 83 ENST00000538617 1134 has no mate -- skipped 2021-12-17 16:29:57,019 WARNING Alignment SRR9113885.358443_GCCAGCTGTT 83 ENST00000536769 3250 has no mate -- skipped 2021-12-17 16:30:03,023 WARNING Alignment SRR9113885.505059_TGCCAGTGAG 83 ENST00000339647 1846 has no mate -- skipped 2021-12-17 16:30:08,450 WARNING Alignment SRR9113885.598608_GGGTCTTGGT 83 ENST00000619423 7081 has no mate -- skipped 2021-12-17 16:30:11,483 WARNING Alignment SRR9113885.640196_TGGGACTTTT 83 ENST00000583866 4197 has no mate -- skipped 2021-12-17 16:30:14,593 WARNING Alignment SRR9113885.736786_GTGTGTGGTG 83 ENST00000567736 2148 has no mate -- skipped 2021-12-17 16:30:19,006 WARNING Alignment SRR9113885.821797_TGGTACTTTT 83 ENST00000617010 6393 has no mate -- skipped 2021-12-17 16:30:25,211 WARNING Alignment SRR9113885.969358_GTGGGGTGCG 83 ENST00000567736 1821 has no mate -- skipped 2021-12-17 16:30:47,795 WARNING Alignment SRR9113885.1108129_TTCTGGATGT 83 ENST00000339647 1777 has no mate -- skipped 2021-12-17 16:30:47,795 WARNING Alignment SRR9113885.1108130_TTCTGGATGT 83 ENST00000536769 3059 has no mate -- skipped 2021-12-17 16:30:47,796 WARNING Alignment SRR9113885.1108131_TCTGGATGTT 83 ENST00000339647 1546 has no mate -- skipped 2021-12-17 16:30:47,797 WARNING Alignment SRR9113885.1108134_GGATGTTGTA 83 ENST00000536769 3281 has no mate -- skipped 2021-12-17 16:30:47,799 WARNING Alignment SRR9113885.1108138_TGTCCATCTT 83 ENST00000339647 1748 has no mate -- skipped 2021-12-17 16:30:50,814 WARNING Alignment SRR9113885.1166981_TGTTGTAGTC 83 ENST00000339647 859 has no mate -- skipped 2021-12-17 16:31:05,304 WARNING Alignment SRR9113885.1451517_GCTCCCTTAT 83 ENST00000568624 2465 has no mate -- skipped 2021-12-17 16:31:06,167 WARNING Alignment SRR9113885.1473660_GTCCACTTTT 83 ENST00000378024 13068 has no mate -- skipped 2021-12-17 16:31:06,167 WARNING Alignment SRR9113885.1473661_GTCCACTTTT 83 ENST00000378024 13068 has no mate -- skipped 2021-12-17 16:31:43,303 WARNING Alignment SRR9113885.2203956_TGTGTGGGGT 83 ENST00000567736 1941 has no mate -- skipped 2021-12-17 16:32:26,064 WARNING Alignment SRR9113885.3073150_TAGGGGTCTG 83 ENST00000567736 2259 has no mate -- skipped 2021-12-17 16:32:26,588 WARNING Alignment SRR9113885.3083739_CTTGGGTCTT 83 ENST00000536769 3191 has no mate -- skipped 2021-12-17 16:32:47,692 WARNING Alignment SRR9113885.3431410_GCCTTGACAT 83 ENST00000536769 3184 has no mate -- skipped 2021-12-17 16:32:47,693 WARNING Alignment SRR9113885.3431411_GACATTCTCA 83 ENST00000536769 3175 has no mate -- skipped 2021-12-17 16:32:47,693 WARNING Alignment SRR9113885.3431413_AGCTCCACTT 83 ENST00000339647 1644 has no mate -- skipped 2021-12-17 16:32:53,268 WARNING Alignment SRR9113885.3511521_CTGCCCACGA 83 ENST00000513405 225 has no mate -- skipped 2021-12-17 16:32:53,874 WARNING Alignment SRR9113885.3523928_GTCCAGCTGT 83 ENST00000339647 1968 has no mate -- skipped 2021-12-17 16:32:57,301 INFO Total pairs output: 2734136, Pairs skipped - no mates: 30, Pairs skipped - not read1 or 2: 0
I find those warning messages confusing because I
made sure to remove singleton reads, but anyway, I ended with
the ready-for-rsem.out.bam file i wanted!
I launched then rsem-calculate-expression
-paired-end --num-threads 15 --temporary-folder tmp/
--alignments ready_for_rsem.out.bam RSEM/GRCh38_ref final
and redirected the standard output to a log file because it was
really long, but the end message was
rsem-run-em: RefSeq.h:85: int RefSeq::get_id(int, int) const: Assertion `pos >= 0 && pos < totLen' failed.
To gain a little bit of insight on the difference of
the files before and after umi_tools prepare-for-rsem,
here's the output of samtools flagstat of both
input and output files of the command:
$samtools flagstat ready_for_rsem.out.bam
5468272 + 0 in total (QC-passed reads + QC-failed reads) 2086706 + 0 secondary 0 + 0 supplementary 0 + 0 duplicates 5468272 + 0 mapped (100.00% : N/A) 3381566 + 0 paired in sequencing 1690783 + 0 read1 1690783 + 0 read2 3381566 + 0 properly paired (100.00% : N/A) 3381566 + 0 with itself and mate mapped 0 + 0 singletons (0.00% : N/A) 0 + 0 with mate mapped to a different chr 0 + 0 with mate mapped to a different chr (mapQ>=5)
$ samtools flagstat SRR9113885_dedup_sorted.out.bam 3707176 + 0 in total (QC-passed reads + QC-failed reads) 1870528 + 0 secondary 0 + 0 supplementary 0 + 0 duplicates 3707176 + 0 mapped (100.00% : N/A) 1836648 + 0 paired in sequencing 918339 + 0 read1 918309 + 0 read2 1836648 + 0 properly paired (100.00% : N/A) 1836648 + 0 with itself and mate mapped 0 + 0 singletons (0.00% : N/A) 0 + 0 with mate mapped to a different chr 0 + 0 with mate mapped to a different chr (mapQ>=5)
Please find attached in this Google Drive folder
the following files:
SRR9113885_Aligned.toTranscriptome.out.bam :
The initial BAM file as it is outputed by STAR
SRR9113885_dedup_sorted.out.bam : The processed
BAM file after sorting, indexing, deduplicating, and name
sorting, as the input for prepare-for-rsem
ready_for_rsem.out.bam : The BAM file that is
outputed from prepare-for-rsem and given to rsem
calculate-expression
rsem.log : The outputed messages RSEM gives
As always, thank you very much for your time and
help! :)
—
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications on the go with GitHub Mobile for iOS or Android.
You are receiving this because you were mentioned.Message
ID: ***@***.***>
[
{ @.": "http://schema.org", @.": "EmailMessage", "potentialAction": { @.": "ViewAction", "target": "https://github.com/CGATOxford/UMI-tools/issues/465#issuecomment-996854807", "url": "https://github.com/CGATOxford/UMI-tools/issues/465#issuecomment-996854807", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { @.": "Organization", "name": "GitHub", "url": "https://github.com" } } ]
--
Aviv De Morgan, Ph.D Head of Bioinformatics Pyxis Diagnostics High Tech Village Givat-Ram, POB 39158 Jerusalem 91391 Israel +972-2-6553333 @.***
Hi,
I'm using umi_tools dedup to remove PCR duplicates from an alignment to the transcriptome with STAR. After the deduplication, when I run RSEM, it seems that there are some reads from the pairs that are lost since the program exits with the following error:
Read ST-E00114:1178:HFL75CCX2:7:1101:1610:55297_TTGCCATCTC: The adjacent two lines do not represent the two mates of a paired-end read! (RSEM assumes the two mates of a paired-end read should be adjacent)
The ran the command with
--paired --multimapping-detection-method=NH --unpaired-reads=discard --chimeric-pairs=discard --unmapped_reads=discard
I have seen that this problem was already discussed in #384, but there is not an option on how to solve this. Do you have any idea on how to solve this issue or a workaround that could work for this case?
Thank you very much