lcabus-flomics commented 3 years ago

Hi,

I'm using umi_tools dedup to remove PCR duplicates from an alignment to the transcriptome with STAR. After the deduplication, when I run RSEM, it seems that there are some reads from the pairs that are lost since the program exits with the following error: Read ST-E00114:1178:HFL75CCX2:7:1101:1610:55297_TTGCCATCTC: The adjacent two lines do not represent the two mates of a paired-end read! (RSEM assumes the two mates of a paired-end read should be adjacent)

The ran the command with --paired --multimapping-detection-method=NH --unpaired-reads=discard --chimeric-pairs=discard --unmapped_reads=discard

I have seen that this problem was already discussed in #384, but there is not an option on how to solve this. Do you have any idea on how to solve this issue or a workaround that could work for this case?

Thank you very much

IanSudbery commented 3 years ago

So the problem with this is that in the STAR output, the the read1 does not say that the read2 is its pair. Given this, there is no way for umi_tools to know that it has to output that read.

One can imagine that the STAR output might be preprocessed to take each pair of reads in turn and set them to be each other's pair. So something like:


import sys
import pysam
inbam = pysam.AlignmentFile("my_bam_file.bam")
outbam = pysam.AlignmentFile("output.bam", "w", template=inbam)

all_reads = inbam.fetch(until_eof=True)

try: 

    read1 = next(all_reads)

    while True:

        read2 = next(all_reads)

        if not(read1.is_read1 and read2.is_read2 and read1.query_name==read2.query_name):
            read1 = read2
            continue

        read1.next_reference_start = read2.reference_start
        read1.next_reference_id = read2.reference_id
        read2.next_reference_start = read1.reference_start
        read2.next_reference_id = read1.reference_id

        outbam.write(read1)
        outbam.write(read2)

        read1 = next(all_reads)

except StopIteration:
    pass

Its a bit of a cludge, but it should work I think, as long as STAR always puts a read's intended pair next to it!

IanSudbery commented 3 years ago

Thats the header - should be present in all bamfiles, although samtools doesn't normally show it? Are you viewing this using head? that probably means it is a sam file.

Try chaning the mode of the output file so: outbam = pysam.AlignmentFile("output.bam", "w", template=inbam)

becomes outbam = pysam.AlignmentFile("output.bam", "wb", template=inbam)

lcabus-flomics commented 3 years ago

Yes, sorry, I have seen that it's a sam file, I have transformed it into a .bam using samtools and now I'm trying the umi_tools dedup to see if it works, I will get back with the results, thank you

lcabus-flomics commented 3 years ago

It doesn't seem to work, when running RSEM after this, the error message is still the same, should I do some type of sorting of the results of STAR before running this script?

IanSudbery commented 3 years ago

Can you run the BAM file through RSEM if you don't deduplicate?

On Thu, 25 Mar 2021 at 10:56, lcabus-flomics @.***> wrote:

It doesn't seem to work, when running RSEM after this, the error message is still the same, should I do some type of sorting of the results of STAR before running this script?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/CGATOxford/UMI-tools/issues/465#issuecomment-806553239, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABJELDU76T7PT7RJ6K442TDTFMJG5ANCNFSM4ZXRN2FA .

lcabus-flomics commented 3 years ago

Yes, it runs it without any problem

lcabus-flomics commented 3 years ago

I also have tried with the bam files when running STAR with single-end and, as expected, the deduplication and RSEM work fine. The problem is only the STAR with paired-end.

IanSudbery commented 3 years ago

I can see what the problem is. It will need some further scripts to fix, and I think I can see a fix within UMI-tools itself. I'll see if I can get chance to have a go at implementing this at the weekend.

lcabus-flomics commented 3 years ago

Okay, thank you very much!

IanSudbery commented 3 years ago

I'm about to start working on this - I'm going to implement something into UMI-tools that should deal with it long term, but while I'm doing that, have you tried running the convert-sam-for-rsem from RSEM? If you have, and that didn't work, I will code up a quick stop gap.

IanSudbery commented 3 years ago

The answer to that is no convert-for-sam doesn't work!

I have created a new script umi_tools prepare-for-sam that should take the output from dedup and produce a file compatible with RSEM.

Its on the {IS}_prepare-for-rsem branch. Would you be able to grab that branch and install it and give it a try. Let me know how it goes.

lcabus-flomics commented 3 years ago

Okay, thank you very much! As soon as I have the results I will post here

ctuni commented 3 years ago

Hi! Regarding this issue, I have already produced the .bam file from the dedup command, but when I try to call the script mentioned by @IanSudbery, which I call using umi_tools prepare-for-rsem sample_dedup_Aligned.toTranscriptome.out.bam. I recieve the following message, which does not stop the execution but it does not produce anything else:

#UMI-tools version: 1.1.1 
#output generated by prepare-for-rsem LC001_dedup_Aligned.toTranscriptome.out.bam
#job started at Wed Apr 14 10:13:54 2021 on flomics-ThinkPad-L580 -- c89235d8-1f03-4918-b2fb-f07649720fd6
 #pid: 1576937, system: Linux 5.4.0-70-generic #78-Ubuntu SMP Fri Mar 19 13:29:52 UTC 2021 x86_64
 #compresslevel                           : 6
 #log2stderr                              : False
 #loglevel                                : 1
 #random_seed                             : None
 #sam                                     : False
 #short_help                              : None
 #stderr                                  : <_io.TextIOWrapper name='<stderr>' mode='w' encoding='utf-8'>
 #stdin                                   : <_io.TextIOWrapper name='<stdin>' mode='r' encoding='utf-8'>
 #stdlog                                  : <_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>
 #stdout                                  : <_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>
 #tags                                    : UG,BX
 #timeit_file                             : None
 #timeit_header                           : None
 #timeit_name                             : all
 #tmpdir                                  : None

I have made sure that I have the correct branch (I suppose if I did not have it it would not find the prepare-for-rsem.py) and have all the dependencies installed. I also tried to run it in a virtual environment with more memory than my machine (32 GB and 16 GB) but it still becomes stalled.

Thank you very much!

IanSudbery commented 3 years ago

Have you sorted by name first? The command requires that the input BAM file is name sorted, or at least collated (i.e. all reads with the same name together).

I guess I could add that to the script so that it would take input directly from dedup.

ctuni commented 3 years ago

HI! Thank you for your answer! I thought it only needed to be indexed but I did not sort it. I have already sorted it with samtools sort and re-launched the command with the sorted bam some time ago but it's still "frozen" like before.

Thanks again!

IanSudbery commented 3 years ago

It needs to be name sorted rather than position sorted (i.e. samtools sort -n or samtools collate). THis is kind of annoying because it has to be position sorted for dedup.

So the process would be:

position sort -> index -> dedup -> name sort/samtools collate -> prepare-for-rsem

prepare-for-rsem doesn't require the input to be indexed (indeed, I don't think its possible to index a name sorted file).

ctuni commented 3 years ago

Hi!

Sorry for the delayed response! I am still having some troubles using the solutions proposed here. What I am doing is:

samtols sort -o out.bam in.sam -> samtools index sorted.bam -> umi_tools dedup --stdin=sorted.bam --log=dedup.log --stdout=dedup.out.bam --paired --multimapping-detection-method=NH --unpaired-reads=discard --chimeric-pairs=discard -> samtools sort -n -o dedup_sorted.out.bam dedup.out.bam -> umi_tools prepare-for-rsem dedup_sorted.out.bam

I am getting stuck on the prepare-for-rsem step, how long should it take to complete? I think it freezes or does not produce an output.

Thanks in advance for all the help!

IanSudbery commented 3 years ago

Sorry, Can you try umi_tools prepare-for-rsem -I dedup_sorted.out.bam --stdout=ready_for_rsem.out.bam

grst commented 3 years ago

I get the following error when running umi_tools prepare-for-rsem -I POOLAM1-40.umi_dedup.sorted_by_name.bam --stdout ready_for_rsem.out.bam

# UMI-tools version: 1.1.1
# output generated by prepare-for-rsem -I POOLAM1-40.umi_dedup.sorted_by_name.bam --stdout ready_for_rsem.out.bam
# job started at Wed Jul 14 18:34:02 2021 on zeus.icbi.local -- b6246918-93a7-4678-bab6-638475ceb9bb
# pid: 29147, system: Linux 3.10.0-1160.11.1.el7.x86_64 #1 SMP Fri Dec 18 16:34:56 UTC 2020 x86_64
# compresslevel                           : 6
# log2stderr                              : False
# loglevel                                : 1
# random_seed                             : None
# sam                                     : False
# short_help                              : None
# stderr                                  : <_io.TextIOWrapper name='<stderr>' mode='w' encoding='utf-8'>
# stdin                                   : <_io.TextIOWrapper name='POOLAM1-40.umi_dedup.sorted_by_name.bam' mode='r' encoding='UTF-8'>
# stdlog                                  : <_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>
# stdout                                  : <_io.TextIOWrapper name='ready_for_rsem.out.bam' mode='w' encoding='UTF-8'>
# tags                                    : UG,BX
# timeit_file                             : None
# timeit_header                           : None
# timeit_name                             : all
# tmpdir                                  : None
[E::idx_find_and_load] Could not retrieve index file for 'POOLAM1-40.umi_dedup.sorted_by_name.bam'
Traceback (most recent call last):
  File "/data/scratch/sturm/conda/envs/test_salmon/bin/umi_tools", line 8, in <module>
    sys.exit(main())
  File "/data/scratch/sturm/conda/envs/test_salmon/lib/python3.8/site-packages/umi_tools/umi_tools.py", line 61, in main
    module.main(sys.argv)
  File "/data/scratch/sturm/conda/envs/test_salmon/lib/python3.8/site-packages/umi_tools/prepare-for-rsem.py", line 164, in main
    mate = current_template[not read.is_read1][mate_key_primary][0]
IndexError: list index out of range

Actually, I don't want to run RSEM, but Salmon in alignment mode on a deduplicated BAM file. Unlike RSEM, Salmon runs through, but the results do not make sense (almost all genes are quantified with 0 reads). I was hoping that it would be the same underlying issue.

IanSudbery commented 3 years ago

Its difficult to tell without access to the data file, but my first instinct would be that there is an unpaired read here - I need to have a careful look at the code, but I think prepare-for-rsem might assume that there is a read1 for every read2, and if there isn't, it gets upset. I'll check the code, but the absolute best thing would be if you could share a small example BAM that was causing the problem.

grst commented 3 years ago

Hi @IanSudbery,

you can access the original and duplicated bamfiles here. They are aligned to the mm10 genome and gencode.vM25.primary_assembly.annotation.gtf using STAR and the nf-core RNA-Seq pipeline.

There was some additional discussion about this with @drpatelh and @rob-p on the nf-core slack [1] [2].

Here's the summary:

According to @rob-p, this issue is likely the reason why Salmon fails to produce reasonable counts on my data. Here's the assumptions Salmon makes about the input files:

Salmon expects that for a paired-end fragment, the alignment records for all alignments of this fragment are consecutive in the file, and that the alignment for end2 is directly after the corresponding alignment for end1.

[Salmon] is aware of orphan alignments (where only one end aligns), it needs to account for this because e.g. implied fragment size can only be computed when you have a proper paired-end alignment, but it then still expects the unmapped record for the second end of the pair to remain in the file (just like RSEM).

When running samtools flagstat on the umi-deduplicated BAM file, we see that the read2s are missing.

(test_salmon) sturm@zeus [SSH] test_umitools % samtools flagstat POOLAM1-40.umi_dedup.transcriptome.sorted.bam 
5768256 + 0 in total (QC-passed reads + QC-failed reads)
2660439 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
5768256 + 0 mapped (100.00% : N/A)
3107817 + 0 paired in sequencing
3107817 + 0 read1
0 + 0 read2
3107817 + 0 properly paired (100.00% : N/A)
3107817 + 0 with itself and mate mapped
0 + 0 singletons (0.00% : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

For comparison, this is the flagstat of the original BAM file as produced by STAR

```console (test_salmon) sturm@zeus [1] [SSH] test_umitools % samtools flagstat POOLAM1-40.Aligned.toTranscriptome.out.bam 27624680 + 0 in total (QC-passed reads + QC-failed reads) 12712284 + 0 secondary 0 + 0 supplementary 0 + 0 duplicates 27624680 + 0 mapped (100.00% : N/A) 14912396 + 0 paired in sequencing 7456198 + 0 read1 7456198 + 0 read2 14912396 + 0 properly paired (100.00% : N/A) 14912396 + 0 with itself and mate mapped 0 + 0 singletons (0.00% : N/A) 0 + 0 with mate mapped to a different chr 0 + 0 with mate mapped to a different chr (mapQ>=5) ```

IanSudbery commented 3 years ago

it then still expects the unmapped record for the second end of the pair to remain in the file (just like RSEM).

I bet this is the problem. We use mapping coordinates to find the mates. Unmapped reads are just dumped. We could output all unmapped reads, but then you'd have unmapped mates of reads that had not been selected after duplication. Its not entirely clear how to get around this: I'll have to have a think.

drpatelh commented 3 years ago

Yup, we kinda figured it had to do with some weirdness with the way reads are filtered from paired-end reads. Given that UMI-tools is traversing the name sorted BAM file would it be possible to have some sort of read buffer where if you encounter reads with the same name where one is flagged as a duplicate and the other is unmapped then dump both? That would be the expected behaviour right?

I also found it weird that all read2 are being dumped in the example above even though they appear to be mapped in the original BAM? Could it have something to do with secondary / supplementary alignments?

IanSudbery commented 3 years ago

Oh. I missed that - I misread your message as "some read2s are missing", and only glanced at the flagstats. Its definitely wrong that all read twos are being dumped. You are running dedup with --paired? We ignore secondary/supplementary flags - we treat all reads irrespective of primary/secondary/supplementary status the same.

UMI-tools count/group/dedup traverses the position sorted BAM, not name sorted. Perhaps if the problem is unmapped read 2 mates of mapped read1s (and I'm not so sure any more), is to output all unmapped reads, and then alter prepare-for-rsem to get rid of the excess unmapped mates.

IanSudbery commented 3 years ago

Sorry I had a dig around in the workflow repo, but I'm not really that familier with NextFlow, and as best I can tell, whether its run with --paired or not will depend on the configuration.

I did notice that you are running with --output-stats though, which we generally don't recommend, because its often very resource intensive.

drpatelh commented 3 years ago

Ah, pants!! No we aren't giving a --paired flag to the dedup command as you can see here 🤦🏽 Should just be a case of adding a couple of lines to the module on nf-core/modules like here. We should also probably remove --output-stats as you suggested too. @grst fancy seeing if this is all a problem on our end? Sorry, I don't know how I missed the --paired option when I added support to nf-core/rnaseq.

UMI-tools count/group/dedup traverses the position sorted BAM, not name sorted.

Yep, I misread the other comments above too! prepare-for-rsem requires a name sorted BAM and not dedup 👍🏽

IanSudbery commented 3 years ago

Yeah, I thought that it might be coming from $options.args. There is still a potential theoretical problem with situations where read1 is alignmed but read2 isn't, so I'll see what I can think of for that.

grst commented 3 years ago

No we aren't giving a --paired flag to the dedup command as you can see here 🤦🏽 Should just be a case of adding a couple of lines to the module on nf-core/modules like here. We should also probably remove --output-stats as you suggested too. @grst fancy seeing if this is all a problem on our end?

Sure thing, will update the module. Not sure, though, if I make it today, and then I'm on :palm_tree: until next Thursday.

drpatelh commented 3 years ago

Thanks! No worries! Maybe we should create a small issue on nf-core/rnaseq linking here so we are able to track it?

ctuni commented 3 years ago

Hi! I'm sorry for my 6 month long silence, but I had to do a lot of things and we kind of put solving this issue in the low-priority pile. Now it arose again, so I re-traced all my steps and get stuck on the prepare-for-rsem step still. Also, without doing it (if I try to do umi_tools dedup -> rsem calculate-expression, I run into a new error I did not find before (I am using other samples but the steps are the same), which reads: The adjacent two lines do not represent the two mates of a paired-end read! (RSEM assumes the two mates of a paired-end read should be adjacent) I know it's a bother reopening this issue, but I will be glad to test whatever to help fix it! Thanks in advance :)

avivdemorgan commented 2 years ago

Hi! I'm sorry for my 6 month long silence, but I had to do a lot of things and we kind of put solving this issue in the low-priority pile. Now it arose again, so I re-traced all my steps and get stuck on the prepare-for-rsem step still. Also, without doing it (if I try to do umi_tools dedup -> rsem calculate-expression, I run into a new error I did not find before (I am using other samples but the steps are the same), which reads: The adjacent two lines do not represent the two mates of a paired-end read! (RSEM assumes the two mates of a paired-end read should be adjacent) I know it's a bother reopening this issue, but I will be glad to test whatever to help fix it! Thanks in advance :)

I am facing the same issue, and prepare-for-rsem fails to run. Ian, are there any news with prepare-for-rsem and the rsem compatibility?

IanSudbery commented 2 years ago

Hi, Sorry, this got lost in the rush of the new teaching semester. I've really only got time for the highest priority stuff. I'll try to have another look at this before Christmas though, as I do seem to remember having an idea.

IanSudbery commented 2 years ago

Just quickly @ctuni did you try running prepare-for-rsem as umi-tools prepare-for-rsem -I dedup_sorted.out.bam --stdout=ready_for_rsem.out.bam?

The error you are getting from rsem is exactly the reason we wrote prepare-for-rsem, which I think works for most people. @avivdemorgan could you also let me know how prepare-for-rsem fails to run? Is there an error message?

ctuni commented 2 years ago

HI! Yes, I tried the prepare-for-rsem command using the syntax you told some time ago. I am getting an error now in prepare-for-rsem command which is the following:

umi_tools prepare-for-rsem -I step2.bam --stdout=step4.bam
# UMI-tools version: 1.1.1
# output generated by prepare-for-rsem -I step2.bam --stdout=step4.bam
# job started at Thu Dec  2 08:50:03 2021 on flomics-ThinkPad-L580 -- 52ccc656-0cde-4bef-b340-6013e746f8cf
# pid: 43390, system: Linux 5.4.0-91-generic #102-Ubuntu SMP Fri Nov 5 16:31:28 UTC 2021 x86_64
# compresslevel                           : 6
# log2stderr                              : False
# loglevel                                : 1
# random_seed                             : None
# sam                                     : False
# short_help                              : None
# stderr                                  : <_io.TextIOWrapper name='<stderr>' mode='w' encoding='utf-8'>
# stdin                                   : <_io.TextIOWrapper name='step2.bam' mode='r' encoding='UTF-8'>
# stdlog                                  : <_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>
# stdout                                  : <_io.TextIOWrapper name='step4.bam' mode='w' encoding='UTF-8'>
# tags                                    : UG,BX
# timeit_file                             : None
# timeit_header                           : None
# timeit_name                             : all
# tmpdir                                  : None
Traceback (most recent call last):
  File "/home/ctuni/anaconda3/bin/umi_tools", line 8, in <module>
    sys.exit(main())
  File "/home/ctuni/anaconda3/lib/python3.8/site-packages/umi_tools/umi_tools.py", line 61, in main
    module.main(sys.argv)
  File "/home/ctuni/anaconda3/lib/python3.8/site-packages/umi_tools/prepare-for-rsem.py", line 203, in main
    outbam.write(key)
TypeError: Argument 'read' has incorrect type (expected pysam.libcalignedsegment.AlignedSegment, got tuple)

What I did was after umi_tools dedup I built the index for the bam file and passed it to prepare-for-rsem.

Sorry for the delay and thank you very much!

avivdemorgan commented 2 years ago

Hi Ian (and @ctuni),

Here is what I did:
1) sorted the deduplicated bam file with samtools sort and then indexed them
2) ran umi_tools
  prepare-for-rsem -I
  sample.Aligned.toTranscriptome.out.sorted.dedup.bam
  --stdout=ready.bam The error message is written below.
3) NOTE: when sorting by name (samtools sort -n or samtools collate)
indices cannot be generated, and prepare-for-rsem fails: it reports a missing
index file, and then exits with the EXACT Traceback
error message as below.

And the error was:

    ***@***.***:~/runs/exp38/dedup$
      umi_tools prepare-for-rsem -I
      sample.Aligned.toTranscriptome.out.sorted.dedup.bam
      --stdout=ready.bam
      # UMI-tools version: 1.1.1
      # output generated by prepare-for-rsem -I
      sample.Aligned.toTranscriptome.out.sorted.dedup.bam
      --stdout=ready.bam
      # job started at Thu Dec  2 09:12:20 2021 on cogent --
      1b2dfe8b-657d-4860-922f-c382a155b768
      # pid: 11661, system: Linux 4.15.0-130-generic #134-Ubuntu SMP Tue
      Jan 5 20:46:26 UTC 2021 x86_64
      # compresslevel                           : 6
      # log2stderr                              : False
      # loglevel                                : 1
      # random_seed                             : None
      # sam                                     : False
      # short_help                              : None
      # stderr                                  : <_io.TextIOWrapper
      name='<stderr>' mode='w' encoding='UTF-8'>
      # stdin                                   : <_io.TextIOWrapper
      name='sample.Aligned.toTranscriptome.out.sorted.dedup.bam'
      mode='r' encoding='UTF-8'>
      # stdlog                                  : <_io.TextIOWrapper
      name='<stdout>' mode='w' encoding='UTF-8'>
      # stdout                                  : <_io.TextIOWrapper
      name='ready.bam' mode='w' encoding='UTF-8'>
      # tags                                    : UG,BX
      # timeit_file                             : None
      # timeit_header                           : None
      # timeit_name                             : all
      # tmpdir                                  : None
      Traceback (most recent call last):
        File "/home/user/.venvs/venv_umitools/bin/umi_tools", line 11,
      in <module>
          load_entry_point('umi-tools==1.1.1', 'console_scripts',
      'umi_tools')()
        File
"/home/user/.venvs/venv_umitools/lib/python3.6/site-packages/umi_tools/umi_tools.py",
      line 61, in main
          module.main(sys.argv)
        File
"/home/user/.venvs/venv_umitools/lib/python3.6/site-packages/umi_tools/prepare-for-rsem.py",
      line 203, in main
          outbam.write(key)
      TypeError: Argument 'read' has incorrect type (expected
      pysam.libcalignedsegment.AlignedSegment, got tuple)

I appreciate your assistance in this matter.

Best,
Aviv.

--

Aviv De Morgan, Ph.D Head of Bioinformatics Pyxis Diagnostics High Tech Village Givat-Ram, POB 39158 Jerusalem 91391 Israel +972-2-6553333 @.***

IanSudbery commented 2 years ago

Ah! Now that I can fix!

I've push a new commit to the {IS}_prepare-for-rsem branch. Would you mind pull it and seeing if it works?

Ian

avivdemorgan commented 2 years ago

body p { margin-bottom: 0cm; margin-top: 0pt; } 

Thanks, Ian.
I will do it tomorrow, as soon as I can, and report back to you.
Thanks for your work fixing this!

Best,
Aviv.

On 04/12/2021 19:08, Ian Sudbery wrote:

  Ah! Now that I can fix!
  I've push a new commit to the {IS}_prepare-for-rsem
    branch. Would you mind pull it and seeing if it works?
  Ian
  —
    You are receiving this because you were mentioned.
    Reply to this email directly, view it on GitHub, or unsubscribe.
    Triage notifications on the go with GitHub Mobile for iOS or Android.

  [

{ @.": "http://schema.org", @.": "EmailMessage", "potentialAction": { @.": "ViewAction", "target": "https://github.com/CGATOxford/UMI-tools/issues/465#issuecomment-986059766", "url": "https://github.com/CGATOxford/UMI-tools/issues/465#issuecomment-986059766", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { @.": "Organization", "name": "GitHub", "url": "https://github.com" } } ]

--

Aviv De Morgan, Ph.D Head of Bioinformatics Pyxis Diagnostics High Tech Village Givat-Ram, POB 39158 Jerusalem 91391 Israel +972-2-6553333 @.***

avivdemorgan commented 2 years ago

body p { margin-bottom: 0cm; margin-top: 0pt; } 

Hi Ian,

This morning I did:
 $ git clone --branch
  {IS}_prepare-for-rsem https://github.com/CGATOxford/UMI-tools/

cd, and installed:
 $ pip install -r
  requirements.txt
and still, it fails with same error on a deduplicated bam file,
after sorting (not sort
  -n or collate),
and indexing.

Best,
Aviv.

On 04/12/2021 19:08, Ian Sudbery wrote:

  Ah! Now that I can fix!
  I've push a new commit to the {IS}_prepare-for-rsem
    branch. Would you mind pull it and seeing if it works?
  Ian
  —
    You are receiving this because you were mentioned.
    Reply to this email directly, view it on GitHub, or unsubscribe.
    Triage notifications on the go with GitHub Mobile for iOS or Android.

  [

{ @.": "http://schema.org", @.": "EmailMessage", "potentialAction": { @.": "ViewAction", "target": "https://github.com/CGATOxford/UMI-tools/issues/465#issuecomment-986059766", "url": "https://github.com/CGATOxford/UMI-tools/issues/465#issuecomment-986059766", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { @.": "Organization", "name": "GitHub", "url": "https://github.com" } } ]

--

Aviv De Morgan, Ph.D Head of Bioinformatics Pyxis Diagnostics High Tech Village Givat-Ram, POB 39158 Jerusalem 91391 Israel +972-2-6553333 @.***

IanSudbery commented 2 years ago

By "it fails with the same error", do you mean RSEM fails with the same error, or prepare-for-rsem fails with the same error?

prepare-for-rsem can't possibly fail with exactly the same error if the new version has installed correctly, because the line causing the error is not longer the same. This would suggest that the installation hadn't worked.

You can try "python setup.py install"

Can you also confirm that you are doing: position sort -> index -> dedup -> name sort/samtools collate -> prepare-for-rsem?

On Sun, 5 Dec 2021 at 10:04, avivdemorgan @.***> wrote:

body p { margin-bottom: 0cm; margin-top: 0pt; }

Hi Ian,

This morning I did: $ git clone --branch {IS}_prepare-for-rsem https://github.com/CGATOxford/UMI-tools/

cd, and installed: $ pip install -r requirements.txt and still, it fails with same error on a deduplicated bam file, after sorting (not sort -n or collate), and indexing.

Best, Aviv.

On 04/12/2021 19:08, Ian Sudbery wrote:

Ah! Now that I can fix! I've push a new commit to the {IS}_prepare-for-rsem branch. Would you mind pull it and seeing if it works? Ian — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

[ { @.": "http://schema.org", @.": "EmailMessage", "potentialAction": { @.": "ViewAction", "target": " https://github.com/CGATOxford/UMI-tools/issues/465#issuecomment-986059766 ", "url": " https://github.com/CGATOxford/UMI-tools/issues/465#issuecomment-986059766 ", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { @.": "Organization", "name": "GitHub", "url": "https://github.com" } } ]

-- Aviv De Morgan, Ph.D Head of Bioinformatics Pyxis Diagnostics High Tech Village Givat-Ram, POB 39158 Jerusalem 91391 Israel +972-2-6553333 @.***

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/CGATOxford/UMI-tools/issues/465#issuecomment-986200508, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABJELDQZZCAJFJGFOBBUAIDUPM2I7ANCNFSM4ZXRN2FA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

avivdemorgan commented 2 years ago

body p { margin-bottom: 0cm; margin-top: 0pt; } 

Hi Ian,

Thanks for the e-mail.
I will test it again today, perhaps I did something wrong.

Best,
Aviv.

On 07/12/2021 0:54, Ian Sudbery wrote:

  By "it fails with the same error", do you mean RSEM fails with the
  same
  error, or prepare-for-rsem fails with the same error?

  prepare-for-rsem can't possibly fail with exactly the same error
  if the new
  version has installed correctly, because the line causing the
  error is not
  longer the same. This would suggest that the installation hadn't
  worked.

  You can try
  "python setup.py install"

  Can you also confirm that you are doing:
  position sort -> index -> dedup -> name sort/samtools
  collate ->
  prepare-for-rsem?

  On Sun, 5 Dec 2021 at 10:04, avivdemorgan ***@***.***> wrote:

  >
  >
  > body p { margin-bottom: 0cm; margin-top: 0pt; }
  >
  >
  > Hi Ian,
  >
  > This morning I did:
  > $ git clone --branch
  > {IS}_prepare-for-rsem
  https://github.com/CGATOxford/UMI-tools/
  >
  > cd, and installed:
  > $ pip install -r
  > requirements.txt
  > and still, it fails with same error on a deduplicated bam
  file,
  > after sorting (not sort
  > -n or collate),
  > and indexing.
  >
  > Best,
  > Aviv.
  >
  > On 04/12/2021 19:08, Ian Sudbery wrote:
  >
  >
  >
  > Ah! Now that I can fix!
  > I've push a new commit to the {IS}_prepare-for-rsem
  > branch. Would you mind pull it and seeing if it works?
  > Ian
  > —
  > You are receiving this because you were mentioned.
  > Reply to this email directly, view it on GitHub, or
  unsubscribe.
  > Triage notifications on the go with GitHub Mobile for iOS or
  Android.
  >
  > [
  > {
  > ***@***.***": "http://schema.org",
  > ***@***.***": "EmailMessage",
  > "potentialAction": {
  > ***@***.***": "ViewAction",
  > "target": "
  >

https://github.com/CGATOxford/UMI-tools/issues/465#issuecomment-986059766

", "url": "

https://github.com/CGATOxford/UMI-tools/issues/465#issuecomment-986059766 ", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { @.***": "Organization", "name": "GitHub", "url": "https://github.com" } } ]

-- Aviv De Morgan, Ph.D Head of Bioinformatics Pyxis Diagnostics High Tech Village Givat-Ram, POB 39158 Jerusalem 91391 Israel +972-2-6553333 @.***

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub

https://github.com/CGATOxford/UMI-tools/issues/465#issuecomment-986200508, or unsubscribe

https://github.com/notifications/unsubscribe-auth/ABJELDQZZCAJFJGFOBBUAIDUPM2I7ANCNFSM4ZXRN2FA . Triage notifications on the go with GitHub Mobile for iOS

https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android

https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

{ @.": "http://schema.org", @.": "EmailMessage", "potentialAction": { @.": "ViewAction", "target": "https://github.com/CGATOxford/UMI-tools/issues/465#issuecomment-987331293", "url": "https://github.com/CGATOxford/UMI-tools/issues/465#issuecomment-987331293", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { @.": "Organization", "name": "GitHub", "url": "https://github.com" } } ]

--

Aviv De Morgan, Ph.D Head of Bioinformatics Pyxis Diagnostics High Tech Village Givat-Ram, POB 39158 Jerusalem 91391 Israel +972-2-6553333 @.***

avivdemorgan commented 2 years ago

body p { margin-bottom: 0cm; margin-top: 0pt; } 

Hi Ian,

I followed the steps you suggested (indeed, the previous
installation was incorrect), and the error output was:

(venv_umitools)
  ***@***.***:~/Desktop/umitools/dedup$ umi_tools prepare-for-rsem -I

sample_S43_R1_001.Aligned.toTranscriptome.out.sorted.dedup.sorted.bam --stdout=rsem_ready.bam

UMI-tools version: 1.1.2

  # output generated by prepare-for-rsem -I
  sample_S43_R1_001.Aligned.toTranscriptome.out.sorted.dedup.sorted.bam
  --stdout=rsem_ready.bam
  # job started at Tue Dec  7 10:57:22 2021 on user --
  258f1d18-b6ea-47b0-9cf7-7b1957e60d57
  # pid: 696301, system: Linux 5.4.0-90-generic #101-Ubuntu SMP Fri
  Oct 15 20:00:55 UTC 2021 x86_64
  # compresslevel                           : 6
  # log2stderr                              : False
  # loglevel                                : 1
  # random_seed                             : None
  # sam                                     : False
  # short_help                              : None
  # stderr                                  : <_io.TextIOWrapper
  name='<stderr>' mode='w' encoding='utf-8'>
  # stdin                                   : <_io.TextIOWrapper

name='sample_S43_R1_001.Aligned.toTranscriptome.out.sorted.dedup.sorted.bam' mode='r' encoding='UTF-8'>

stdlog : <_io.TextIOWrapper

  name='<stdout>' mode='w' encoding='utf-8'>
  # stdout                                  : <_io.TextIOWrapper
  name='rsem_ready.bam' mode='w' encoding='UTF-8'>
  # tags                                    : UG,BX
  # timeit_file                             : None
  # timeit_header                           : None
  # timeit_name                             : all
  # tmpdir                                  : None
  [E::idx_find_and_load] Could not retrieve index file for
  'sample_S43_R1_001.Aligned.toTranscriptome.out.sorted.dedup.sorted.bam'
  Traceback (most recent call last):
    File "/home/user/venv_umitools/bin/umi_tools", line 11, in
  <module>
      load_entry_point('umi-tools==1.1.2', 'console_scripts',
  'umi_tools')()
    File

"/home/user/venv_umitools/lib/python3.8/site-packages/umi_tools-1.1.2-py3.8-linux-x86_64.egg/umi_tools/umi_tools.py", line 61, in main module.main(sys.argv) File "/home/user/venv_umitools/lib/python3.8/site-packages/umi_tools-1.1.2-py3.8-linux-x86_64.egg/umi_tools/prepare-for-rsem.py", line 164, in main mate = current_template[not read.is_read1][mate_key_primary][0] IndexError: list index out of range

Awaiting your input,
Sincerely,
Aviv.

On 07/12/2021 0:54, Ian Sudbery wrote:

  By "it fails with the same error", do you mean RSEM fails with the
  same
  error, or prepare-for-rsem fails with the same error?

  prepare-for-rsem can't possibly fail with exactly the same error
  if the new
  version has installed correctly, because the line causing the
  error is not
  longer the same. This would suggest that the installation hadn't
  worked.

  You can try
  "python setup.py install"

  Can you also confirm that you are doing:
  position sort -> index -> dedup -> name sort/samtools
  collate ->
  prepare-for-rsem?

  On Sun, 5 Dec 2021 at 10:04, avivdemorgan ***@***.***> wrote:

  >
  >
  > body p { margin-bottom: 0cm; margin-top: 0pt; }
  >
  >
  > Hi Ian,
  >
  > This morning I did:
  > $ git clone --branch
  > {IS}_prepare-for-rsem
  https://github.com/CGATOxford/UMI-tools/
  >
  > cd, and installed:
  > $ pip install -r
  > requirements.txt
  > and still, it fails with same error on a deduplicated bam
  file,
  > after sorting (not sort
  > -n or collate),
  > and indexing.
  >
  > Best,
  > Aviv.
  >
  > On 04/12/2021 19:08, Ian Sudbery wrote:
  >
  >
  >
  > Ah! Now that I can fix!
  > I've push a new commit to the {IS}_prepare-for-rsem
  > branch. Would you mind pull it and seeing if it works?
  > Ian
  > —
  > You are receiving this because you were mentioned.
  > Reply to this email directly, view it on GitHub, or
  unsubscribe.
  > Triage notifications on the go with GitHub Mobile for iOS or
  Android.
  >
  > [
  > {
  > ***@***.***": "http://schema.org",
  > ***@***.***": "EmailMessage",
  > "potentialAction": {
  > ***@***.***": "ViewAction",
  > "target": "
  >

https://github.com/CGATOxford/UMI-tools/issues/465#issuecomment-986059766

", "url": "

https://github.com/CGATOxford/UMI-tools/issues/465#issuecomment-986059766 ", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { @.***": "Organization", "name": "GitHub", "url": "https://github.com" } } ]

-- Aviv De Morgan, Ph.D Head of Bioinformatics Pyxis Diagnostics High Tech Village Givat-Ram, POB 39158 Jerusalem 91391 Israel +972-2-6553333 @.***

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub

https://github.com/CGATOxford/UMI-tools/issues/465#issuecomment-986200508, or unsubscribe

https://github.com/notifications/unsubscribe-auth/ABJELDQZZCAJFJGFOBBUAIDUPM2I7ANCNFSM4ZXRN2FA . Triage notifications on the go with GitHub Mobile for iOS

https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android

https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

{ @.": "http://schema.org", @.": "EmailMessage", "potentialAction": { @.": "ViewAction", "target": "https://github.com/CGATOxford/UMI-tools/issues/465#issuecomment-987331293", "url": "https://github.com/CGATOxford/UMI-tools/issues/465#issuecomment-987331293", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { @.": "Organization", "name": "GitHub", "url": "https://github.com" } } ]

--

Aviv De Morgan, Ph.D Head of Bioinformatics Pyxis Diagnostics High Tech Village Givat-Ram, POB 39158 Jerusalem 91391 Israel +972-2-6553333 @.***

ctuni commented 2 years ago

Hello!

I have been able to try the new version of prepare for RSEM branch and indeed, the previous error does not appear more :)

But some others do. To recap a little, I obtained the aligned to transcriptome bam files with STAR and I position sorted them, created the index file, used dedup command, name sorted them, and then ran umi_tools prepare-for-rsem. I also tried to run the prepare-for-rsem command with the deduped file directly, without name sorting it so I could create an index and see what happens. The error I obtained using prepare-for-rsem with the name sorted file is the following:

umi_tools prepare-for-rsem -I SRR9113885_dedup_sorted.out.bam --stdout=ready_for_rsem.out.bam
# UMI-tools version: 1.1.2
# output generated by prepare-for-rsem -I SRR9113885_dedup_sorted.out.bam --stdout=ready_for_rsem.out.bam
# job started at Thu Dec  9 12:22:41 2021 on flomics-ThinkPad-L580 -- ddae99d7-c386-42a5-84d7-8f7a78caf60d
# pid: 340867, system: Linux 5.4.0-91-generic #102-Ubuntu SMP Fri Nov 5 16:31:28 UTC 2021 x86_64
# compresslevel                           : 6
# log2stderr                              : False
# loglevel                                : 1
# random_seed                             : None
# sam                                     : False
# short_help                              : None
# stderr                                  : <_io.TextIOWrapper name='<stderr>' mode='w' encoding='utf-8'>
# stdin                                   : <_io.TextIOWrapper name='SRR9113885_dedup_sorted.out.bam' mode='r' encoding='UTF-8'>
# stdlog                                  : <_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>
# stdout                                  : <_io.TextIOWrapper name='ready_for_rsem.out.bam' mode='w' encoding='UTF-8'>
# tags                                    : UG,BX
# timeit_file                             : None
# timeit_header                           : None
# timeit_name                             : all
# tmpdir                                  : None
[E::idx_find_and_load] Could not retrieve index file for 'SRR9113885_dedup_sorted.out.bam'
Traceback (most recent call last):
  File "/home/ctuni/anaconda3/bin/umi_tools", line 8, in <module>
    sys.exit(main())
  File "/home/ctuni/anaconda3/lib/python3.8/site-packages/umi_tools/umi_tools.py", line 61, in main
    module.main(sys.argv)
  File "/home/ctuni/anaconda3/lib/python3.8/site-packages/umi_tools/prepare-for-rsem.py", line 164, in main
    mate = current_template[not read.is_read1][mate_key_primary][0]
IndexError: list index out of range

I tried using the deduped bam that was not name sorted because of the message regarding not being able to retrieve the index file. I created the index file of the deduped bam and used prepare- for-rsem and the message about a missing index did not show, but I think the error was the same:

umi_tools prepare-for-rsem -I SRR9113885_dedup.out.bam --stdout=ready_for_rsem.out.bam
# UMI-tools version: 1.1.2
# output generated by prepare-for-rsem -I SRR9113885_dedup.out.bam --stdout=ready_for_rsem.out.bam
# job started at Thu Dec  9 12:23:45 2021 on flomics-ThinkPad-L580 -- c632d846-0c10-4602-90d4-23cd6ca6f494
# pid: 340936, system: Linux 5.4.0-91-generic #102-Ubuntu SMP Fri Nov 5 16:31:28 UTC 2021 x86_64
# compresslevel                           : 6
# log2stderr                              : False
# loglevel                                : 1
# random_seed                             : None
# sam                                     : False
# short_help                              : None
# stderr                                  : <_io.TextIOWrapper name='<stderr>' mode='w' encoding='utf-8'>
# stdin                                   : <_io.TextIOWrapper name='SRR9113885_dedup.out.bam' mode='r' encoding='UTF-8'>
# stdlog                                  : <_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>
# stdout                                  : <_io.TextIOWrapper name='ready_for_rsem.out.bam' mode='w' encoding='UTF-8'>
# tags                                    : UG,BX
# timeit_file                             : None
# timeit_header                           : None
# timeit_name                             : all
# tmpdir                                  : None
Traceback (most recent call last):
  File "/home/ctuni/anaconda3/bin/umi_tools", line 8, in <module>
    sys.exit(main())
  File "/home/ctuni/anaconda3/lib/python3.8/site-packages/umi_tools/umi_tools.py", line 61, in main
    module.main(sys.argv)
  File "/home/ctuni/anaconda3/lib/python3.8/site-packages/umi_tools/prepare-for-rsem.py", line 164, in main
    mate = current_template[not read.is_read1][mate_key_primary][0]
IndexError: list index out of range

I think I may be missing something there, so I apologize in advance if I skipped a necessary step to make it work. Thank you very much for the patience and help!

avivdemorgan commented 2 years ago

body p { margin-bottom: 0cm; margin-top: 0pt; } 

Hi Cristina,

Thanks for your e-mail.
I have the same error message, and I e-mailed Ian about this.
I hope Ian will resolve this sometime soon.

Best,
Aviv.

On 09/12/2021 13:30, Cristina Tuñí i
  Domínguez wrote:

  Hello!
  I have been able to try the new version of prepare
    for RSEM branch and indeed, the previous error does not appear
    more :)
  But some others do. To recap a little, I obtained
    the aligned to transcriptome bam files with STAR and I position
    sorted them, created the index file, used dedup
    command, name sorted them, and then ran umi_tools
      prepare-for-rsem. I also tried to run the prepare-for-rsem
    command with the deduped file directly, without name sorting it
    so I could create an index and see what happens. The error I
    obtained using prepare-for-rsem with the name
    sorted file is the following:
  umi_tools prepare-for-rsem -I SRR9113885_dedup_sorted.out.bam --stdout=ready_for_rsem.out.bam

UMI-tools version: 1.1.2

output generated by prepare-for-rsem -I SRR9113885_dedup_sorted.out.bam --stdout=ready_for_rsem.out.bam

job started at Thu Dec 9 12:22:41 2021 on flomics-ThinkPad-L580 -- ddae99d7-c386-42a5-84d7-8f7a78caf60d

pid: 340867, system: Linux 5.4.0-91-generic #102-Ubuntu SMP Fri Nov 5 16:31:28 UTC 2021 x86_64

compresslevel : 6

log2stderr : False

loglevel : 1

random_seed : None

sam : False

short_help : None

stderr : <_io.TextIOWrapper name='' mode='w' encoding='utf-8'>

stdin : <_io.TextIOWrapper name='SRR9113885_dedup_sorted.out.bam' mode='r' encoding='UTF-8'>

stdlog : <_io.TextIOWrapper name='' mode='w' encoding='utf-8'>

stdout : <_io.TextIOWrapper name='ready_for_rsem.out.bam' mode='w' encoding='UTF-8'>

tags : UG,BX

timeit_file : None

timeit_header : None

timeit_name : all

tmpdir : None

[E::idx_find_and_load] Could not retrieve index file for 'SRR9113885_dedup_sorted.out.bam' Traceback (most recent call last): File "/home/ctuni/anaconda3/bin/umi_tools", line 8, in sys.exit(main()) File "/home/ctuni/anaconda3/lib/python3.8/site-packages/umi_tools/umi_tools.py", line 61, in main module.main(sys.argv) File "/home/ctuni/anaconda3/lib/python3.8/site-packages/umi_tools/prepare-for-rsem.py", line 164, in main mate = current_template[not read.is_read1][mate_key_primary][0] IndexError: list index out of range

  I tried using the deduped bam that was not name
    sorted because of the message regarding not being able to
    retrieve the index file. I created the index file of the deduped
    bam and used prepare- for-rsem and the message
    about a missing index did not show, but I think the error was
    the same:
  umi_tools prepare-for-rsem -I SRR9113885_dedup.out.bam --stdout=ready_for_rsem.out.bam

UMI-tools version: 1.1.2

output generated by prepare-for-rsem -I SRR9113885_dedup.out.bam --stdout=ready_for_rsem.out.bam

job started at Thu Dec 9 12:23:45 2021 on flomics-ThinkPad-L580 -- c632d846-0c10-4602-90d4-23cd6ca6f494

pid: 340936, system: Linux 5.4.0-91-generic #102-Ubuntu SMP Fri Nov 5 16:31:28 UTC 2021 x86_64

compresslevel : 6

log2stderr : False

loglevel : 1

random_seed : None

sam : False

short_help : None

stderr : <_io.TextIOWrapper name='' mode='w' encoding='utf-8'>

stdin : <_io.TextIOWrapper name='SRR9113885_dedup.out.bam' mode='r' encoding='UTF-8'>

stdlog : <_io.TextIOWrapper name='' mode='w' encoding='utf-8'>

stdout : <_io.TextIOWrapper name='ready_for_rsem.out.bam' mode='w' encoding='UTF-8'>

tags : UG,BX

timeit_file : None

timeit_header : None

timeit_name : all

tmpdir : None

Traceback (most recent call last): File "/home/ctuni/anaconda3/bin/umi_tools", line 8, in sys.exit(main()) File "/home/ctuni/anaconda3/lib/python3.8/site-packages/umi_tools/umi_tools.py", line 61, in main module.main(sys.argv) File "/home/ctuni/anaconda3/lib/python3.8/site-packages/umi_tools/prepare-for-rsem.py", line 164, in main mate = current_template[not read.is_read1][mate_key_primary][0] IndexError: list index out of range

  I think I may be missing something there, so I
    apologize in advance if I skipped a necessary step to make it
    work. Thank you very much for the patience and help!
  —
    You are receiving this because you were mentioned.
    Reply to this email directly, view it on GitHub, or unsubscribe.
    Triage notifications on the go with GitHub Mobile for iOS or Android.

  [

{ @.": "http://schema.org", @.": "EmailMessage", "potentialAction": { @.": "ViewAction", "target": "https://github.com/CGATOxford/UMI-tools/issues/465#issuecomment-989767559", "url": "https://github.com/CGATOxford/UMI-tools/issues/465#issuecomment-989767559", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { @.": "Organization", "name": "GitHub", "url": "https://github.com" } } ]

--

Aviv De Morgan, Ph.D Head of Bioinformatics Pyxis Diagnostics High Tech Village Givat-Ram, POB 39158 Jerusalem 91391 Israel +972-2-6553333 @.***

IanSudbery commented 2 years ago

I've pushed another update. It runs with out error on my testing file, but it might be helpful to have a better test file to work with if one of you would be willing to share a BAM or a subset of a BAM with me.

avivdemorgan commented 2 years ago

body p { margin-bottom: 0cm; margin-top: 0pt; } 

Hi Ian,

Many thanks for your prompt answer!
Sure, I will share some BAM files with you, using Dropbox.
I will do it later on today.
And I am most grateful for your assistance in this issue.

Best wishes,
Aviv.

On 14/12/2021 14:13, Ian Sudbery wrote:

  I've pushed another update. It runs with out error
    on my testing file, but it might be helpful to have a better
    test file to work with if one of you would be willing to share a
    BAM or a subset of a BAM with me.
  —
    You are receiving this because you were mentioned.
    Reply to this email directly, view it on GitHub, or unsubscribe.
    Triage notifications on the go with GitHub Mobile for iOS or Android.

  [

{ @.": "http://schema.org", @.": "EmailMessage", "potentialAction": { @.": "ViewAction", "target": "https://github.com/CGATOxford/UMI-tools/issues/465#issuecomment-993480322", "url": "https://github.com/CGATOxford/UMI-tools/issues/465#issuecomment-993480322", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { @.": "Organization", "name": "GitHub", "url": "https://github.com" } } ]

--

Aviv De Morgan, Ph.D Head of Bioinformatics Pyxis Diagnostics High Tech Village Givat-Ram, POB 39158 Jerusalem 91391 Israel +972-2-6553333 @.***

IanSudbery commented 2 years ago

Anyone try the latest PR out to see if it works?

ctuni commented 2 years ago

Hello! Thanks for the follow up! I can share some BAM files and also test the new PR tomorrow. Thank you very much and sorry for the delay :)

ctuni commented 2 years ago

Okay! So I have been able to test the new version of ready-for-rsem command and it works correctly it seems! The output is the following:

umi_tools prepare-for-rsem -I SRR9113885_dedup_sorted.out.bam --stdout=ready_for_rsem.out.bam
# UMI-tools version: 1.1.2
# output generated by prepare-for-rsem -I SRR9113885_dedup_sorted.out.bam --stdout=ready_for_rsem.out.bam
# job started at Fri Dec 17 16:29:28 2021 on ip-172-31-20-55 -- 308eea8d-1f69-413c-bc58-169b39fd0c7c
# pid: 9532, system: Linux 5.11.0-1022-aws #23~20.04.1-Ubuntu SMP Mon Nov 15 14:03:19 UTC 2021 x86_64
# compresslevel                           : 6
# log2stderr                              : False
# loglevel                                : 1
# random_seed                             : None
# sam                                     : False
# short_help                              : None
# stderr                                  : <_io.TextIOWrapper name='<stderr>' mode='w' encoding='utf-8'>
# stdin                                   : <_io.TextIOWrapper name='SRR9113885_dedup_sorted.out.bam' mode='r' encoding='UTF-8'>
# stdlog                                  : <_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>
# stdout                                  : <_io.TextIOWrapper name='ready_for_rsem.out.bam' mode='w' encoding='UTF-8'>
# tags                                    : UG,BX
# timeit_file                             : None
# timeit_header                           : None
# timeit_name                             : all
# tmpdir                                  : None
[E::idx_find_and_load] Could not retrieve index file for 'SRR9113885_dedup_sorted.out.bam'
2021-12-17 16:29:38,323 WARNING Alignment SRR9113885.8167_CCTCATGGTC    83      ENST00000417570 211 has no mate -- skipped
2021-12-17 16:29:46,455 WARNING Alignment SRR9113885.189982_AGTTAGATGT  83      ENST00000652551 2697 has no mate -- skipped
2021-12-17 16:29:46,456 WARNING Alignment SRR9113885.189984_GTCTCGATGT  83      ENST00000480246 1934 has no mate -- skipped
2021-12-17 16:29:46,456 WARNING Alignment SRR9113885.189986_TGATGTCTAA  83      ENST00000652551 2320 has no mate -- skipped
2021-12-17 16:29:50,418 WARNING Alignment SRR9113885.232824_TGTCAAGGGC  83      ENST00000536769 3536 has no mate -- skipped
2021-12-17 16:29:50,419 WARNING Alignment SRR9113885.232826_TGCAGGGTGG  83      ENST00000538617 1134 has no mate -- skipped
2021-12-17 16:29:57,019 WARNING Alignment SRR9113885.358443_GCCAGCTGTT  83      ENST00000536769 3250 has no mate -- skipped
2021-12-17 16:30:03,023 WARNING Alignment SRR9113885.505059_TGCCAGTGAG  83      ENST00000339647 1846 has no mate -- skipped
2021-12-17 16:30:08,450 WARNING Alignment SRR9113885.598608_GGGTCTTGGT  83      ENST00000619423 7081 has no mate -- skipped
2021-12-17 16:30:11,483 WARNING Alignment SRR9113885.640196_TGGGACTTTT  83      ENST00000583866 4197 has no mate -- skipped
2021-12-17 16:30:14,593 WARNING Alignment SRR9113885.736786_GTGTGTGGTG  83      ENST00000567736 2148 has no mate -- skipped
2021-12-17 16:30:19,006 WARNING Alignment SRR9113885.821797_TGGTACTTTT  83      ENST00000617010 6393 has no mate -- skipped
2021-12-17 16:30:25,211 WARNING Alignment SRR9113885.969358_GTGGGGTGCG  83      ENST00000567736 1821 has no mate -- skipped
2021-12-17 16:30:47,795 WARNING Alignment SRR9113885.1108129_TTCTGGATGT 83      ENST00000339647 1777 has no mate -- skipped
2021-12-17 16:30:47,795 WARNING Alignment SRR9113885.1108130_TTCTGGATGT 83      ENST00000536769 3059 has no mate -- skipped
2021-12-17 16:30:47,796 WARNING Alignment SRR9113885.1108131_TCTGGATGTT 83      ENST00000339647 1546 has no mate -- skipped
2021-12-17 16:30:47,797 WARNING Alignment SRR9113885.1108134_GGATGTTGTA 83      ENST00000536769 3281 has no mate -- skipped
2021-12-17 16:30:47,799 WARNING Alignment SRR9113885.1108138_TGTCCATCTT 83      ENST00000339647 1748 has no mate -- skipped
2021-12-17 16:30:50,814 WARNING Alignment SRR9113885.1166981_TGTTGTAGTC 83      ENST00000339647 859 has no mate -- skipped
2021-12-17 16:31:05,304 WARNING Alignment SRR9113885.1451517_GCTCCCTTAT 83      ENST00000568624 2465 has no mate -- skipped
2021-12-17 16:31:06,167 WARNING Alignment SRR9113885.1473660_GTCCACTTTT 83      ENST00000378024 13068 has no mate -- skipped
2021-12-17 16:31:06,167 WARNING Alignment SRR9113885.1473661_GTCCACTTTT 83      ENST00000378024 13068 has no mate -- skipped
2021-12-17 16:31:43,303 WARNING Alignment SRR9113885.2203956_TGTGTGGGGT 83      ENST00000567736 1941 has no mate -- skipped
2021-12-17 16:32:26,064 WARNING Alignment SRR9113885.3073150_TAGGGGTCTG 83      ENST00000567736 2259 has no mate -- skipped
2021-12-17 16:32:26,588 WARNING Alignment SRR9113885.3083739_CTTGGGTCTT 83      ENST00000536769 3191 has no mate -- skipped
2021-12-17 16:32:47,692 WARNING Alignment SRR9113885.3431410_GCCTTGACAT 83      ENST00000536769 3184 has no mate -- skipped
2021-12-17 16:32:47,693 WARNING Alignment SRR9113885.3431411_GACATTCTCA 83      ENST00000536769 3175 has no mate -- skipped
2021-12-17 16:32:47,693 WARNING Alignment SRR9113885.3431413_AGCTCCACTT 83      ENST00000339647 1644 has no mate -- skipped
2021-12-17 16:32:53,268 WARNING Alignment SRR9113885.3511521_CTGCCCACGA 83      ENST00000513405 225 has no mate -- skipped
2021-12-17 16:32:53,874 WARNING Alignment SRR9113885.3523928_GTCCAGCTGT 83      ENST00000339647 1968 has no mate -- skipped
2021-12-17 16:32:57,301 INFO Total pairs output: 2734136, Pairs skipped - no mates: 30, Pairs skipped - not read1 or 2: 0
# job finished in 208 seconds at Fri Dec 17 16:32:57 2021 -- 208.75  3.53  0.00  0.00 -- 308eea8d-1f69-413c-bc58-169b39fd0c7c

I find those warning messages confusing because I made sure to remove singleton reads, but anyway, I ended with the ready-for-rsem.out.bam file i wanted!

I launched then rsem-calculate-expression -paired-end --num-threads 15 --temporary-folder tmp/ --alignments ready_for_rsem.out.bam RSEM/GRCh38_ref final and redirected the standard output to a log file because it was really long, but the end message was

rsem-run-em: RefSeq.h:85: int RefSeq::get_id(int, int) const: Assertion `pos >= 0 && pos < totLen' failed.

To gain a little bit of insight on the difference of the files before and after umi_tools prepare-for-rsem, here's the output of samtools flagstat of both input and output files of the command:

$samtools flagstat ready_for_rsem.out.bam 
5468272 + 0 in total (QC-passed reads + QC-failed reads)
2086706 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
5468272 + 0 mapped (100.00% : N/A)
3381566 + 0 paired in sequencing
1690783 + 0 read1
1690783 + 0 read2
3381566 + 0 properly paired (100.00% : N/A)
3381566 + 0 with itself and mate mapped
0 + 0 singletons (0.00% : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

$ samtools flagstat SRR9113885_dedup_sorted.out.bam 
3707176 + 0 in total (QC-passed reads + QC-failed reads)
1870528 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
3707176 + 0 mapped (100.00% : N/A)
1836648 + 0 paired in sequencing
918339 + 0 read1
918309 + 0 read2
1836648 + 0 properly paired (100.00% : N/A)
1836648 + 0 with itself and mate mapped
0 + 0 singletons (0.00% : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

Please find attached in this Google Drive folder the following files:

SRR9113885_Aligned.toTranscriptome.out.bam : The initial BAM file as it is outputed by STAR
SRR9113885_dedup_sorted.out.bam : The processed BAM file after sorting, indexing, deduplicating, and name sorting, as the input for prepare-for-rsem
ready_for_rsem.out.bam : The BAM file that is outputed from prepare-for-rsem and given to rsem calculate-expression
rsem.log : The outputed messages RSEM gives

As always, thank you very much for your time and help! :)

avivdemorgan commented 2 years ago

body p { margin-bottom: 0cm; margin-top: 0pt; } 

Hi Ian,

Thanks for the e-mail.
Which branch to pull?

Best,
Aviv.

On 16/12/2021 17:42, Ian Sudbery wrote:

  Anyone try the latest PR out to see if it works?
  —
    Reply to this email directly, view it on GitHub, or unsubscribe.
    Triage notifications on the go with GitHub Mobile for iOS or Android.

    You are receiving this because you were mentioned.Message
      ID: ***@***.***>
  [

{ @.": "http://schema.org", @.": "EmailMessage", "potentialAction": { @.": "ViewAction", "target": "https://github.com/CGATOxford/UMI-tools/issues/465#issuecomment-995936197", "url": "https://github.com/CGATOxford/UMI-tools/issues/465#issuecomment-995936197", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { @.": "Organization", "name": "GitHub", "url": "https://github.com" } } ]

--

Aviv De Morgan, Ph.D Head of Bioinformatics Pyxis Diagnostics High Tech Village Givat-Ram, POB 39158 Jerusalem 91391 Israel +972-2-6553333 @.***

avivdemorgan commented 2 years ago

body p { margin-bottom: 0cm; margin-top: 0pt; } 

Hi Ian,

I pulled branch {IS}_prepare_for_rsem
and installed.
This time, prepare-for-rsem
completed without errors (last lines only):
.
  .
  2021-12-19 12:29:57,111 WARNING Alignment
  A01032:183:HKT5GDRXY:2:2278:20573:31015_TTATCGGG    99   
  ENST00000581296.2    3251 has no mate -- skipped
  2021-12-19 12:29:57,752 WARNING Alignment
  A01032:183:HKT5GDRXY:2:2278:29460:30342_CCTCTCCG    99   
  ENST00000283195.11    4960 has no mate -- skipped
  021-12-19 12:29:58,085 INFO Total pairs output: 2354942, Pairs
  skipped - no mates: 415, Pairs skipped - not read1 or 2: 0
  # job finished in 538 seconds at Sun Dec 19 12:29:58 2021 --
  534.52  3.09  0.00  0.00 -- 77b19c92-7519-47f6-bdef-698c43c3a912

And them, rsem-calculate-expression, gave this:

(venv_umitools)
  ***@***.***:~/Desktop/umitools/dedup$
  rsem-calculate-expression -p 2 --no-bam-output --paired-end
  --alignments rsem_ready.bam
  /home/aviv/Desktop/umitools/rsem_ref_gencode_v38/rsem_ref_gencode
  counts
  rsem-parse-alignments
  /home/aviv/Desktop/umitools/rsem_ref_gencode_v38/rsem_ref_gencode
  counts.temp/counts counts.stat/counts rsem_ready.bam 3 -tag XM
  Warning: Detected a read pair whose two mates have different
  names--A01032:183:HKT5GDRXY:1:2101:1181:27618_GAGACTGG and
  A01032:183:HKT5GDRXY:2:2252:1651:15812_GTCGCTAA!
  Warning: Detected a read pair whose two mates have different
  names--A01032:183:HKT5GDRXY:1:2101:1181:27618_GAGACTGG and
  A01032:183:HKT5GDRXY:2:2252:1651:15812_GTCGCTAA!
  Warning: Detected a read pair whose two mates have different
  names--A01032:183:HKT5GDRXY:1:2101:1181:27618_GAGACTGG and
  A01032:183:HKT5GDRXY:2:2228:29812:27258_CTAGTATT!
  Warning: Detected a read pair whose two mates have different
  names--A01032:183:HKT5GDRXY:2:2252:1651:15812_GTCGCTAA and
  A01032:183:HKT5GDRXY:1:2101:1181:27618_GAGACTGG!
  Warning: Detected a read pair whose two mates have different
  names--A01032:183:HKT5GDRXY:2:2252:1651:15812_GTCGCTAA and
  A01032:183:HKT5GDRXY:1:2101:1181:27618_GAGACTGG!
  Warning: Detected a read pair whose two mates have different
  names--A01032:183:HKT5GDRXY:2:2228:29812:27258_CTAGTATT and
  A01032:183:HKT5GDRXY:1:2101:1181:27618_GAGACTGG!
  Warning: Detected a read pair whose two mates have different
  names--A01032:183:HKT5GDRXY:1:2101:1805:19460_CTCATGGA and
  A01032:183:HKT5GDRXY:1:2153:16749:21042_CCCCTAGA!
  Warning: Detected a read pair whose two mates have different
  names--A01032:183:HKT5GDRXY:1:2153:16749:21042_CCCCTAGA and
  A01032:183:HKT5GDRXY:1:2101:1805:19460_CTCATGGA!
  Warning: Detected a read pair whose two mates have different
  names--A01032:183:HKT5GDRXY:1:2101:1805:24189_TTCTCGCA and
  A01032:183:HKT5GDRXY:2:2109:17951:22874_AGATGGAT!
  Paired-end read A01032:183:HKT5GDRXY:1:2101:1805:24189_TTCTCGCA
  has alignments with inconsistent mate lengths!
  "rsem-parse-alignments
  /home/aviv/Desktop/umitools/rsem_ref_gencode_v38/rsem_ref_gencode
  counts.temp/counts counts.stat/counts rsem_ready.bam 3 -tag XM"
  failed! Plase check if you provide correct parameters/options for
  the pipeline!

I think this error is equivalent to yours, and differs in syntax,
since we probably sorted the deduplicated bams differently, i.e., samtools sort -n vs.
samtools collate.
I hope this is helpful. 

Best wishes,
Aviv.

On 17/12/2021 18:28, Cristina Tuñí i
  Domínguez wrote:

  Okay! So I have been able to test the new version of
    ready-for-rsem command and it works correctly it
    seems! The output is the following:
  umi_tools prepare-for-rsem -I SRR9113885_dedup_sorted.out.bam --stdout=ready_for_rsem.out.bam

UMI-tools version: 1.1.2

output generated by prepare-for-rsem -I SRR9113885_dedup_sorted.out.bam --stdout=ready_for_rsem.out.bam

job started at Fri Dec 17 16:29:28 2021 on ip-172-31-20-55 -- 308eea8d-1f69-413c-bc58-169b39fd0c7c

pid: 9532, system: Linux 5.11.0-1022-aws #23~20.04.1-Ubuntu SMP Mon Nov 15 14:03:19 UTC 2021 x86_64

compresslevel : 6

log2stderr : False

loglevel : 1

random_seed : None

sam : False

short_help : None

stderr : <_io.TextIOWrapper name='' mode='w' encoding='utf-8'>

stdin : <_io.TextIOWrapper name='SRR9113885_dedup_sorted.out.bam' mode='r' encoding='UTF-8'>

stdlog : <_io.TextIOWrapper name='' mode='w' encoding='utf-8'>

stdout : <_io.TextIOWrapper name='ready_for_rsem.out.bam' mode='w' encoding='UTF-8'>

tags : UG,BX

timeit_file : None

timeit_header : None

timeit_name : all

tmpdir : None

[E::idx_find_and_load] Could not retrieve index file for 'SRR9113885_dedup 2021-12-17 16:29:38,323 WARNING Alignment SRR9113885.8167_CCTCATGGTC 83 2021-12-17 16:29:46,455 WARNING Alignment SRR9113885.189982_AGTTAGATGT 83 2021-12-17 16:29:46,456 WARNING Alignment SRR9113885.189984_GTCTCGATGT 83 2021-12-17 16:29:46,456 WARNING Alignment SRR9113885.189986_TGATGTCTAA 83 2021-12-17 16:29:50,418 WARNING Alignment SRR9113885.232824_TGTCAAGGGC 83 2021-12-17 16:29:50,419 WARNING Alignment SRR9113885.232826_TGCAGGGTGG 83 2021-12-17 16:29:57,019 WARNING Alignment SRR9113885.358443_GCCAGCTGTT 83 2021-12-17 16:30:03,023 WARNING Alignment SRR9113885.505059_TGCCAGTGAG 83 2021-12-17 16:30:08,450 WARNING Alignment SRR9113885.598608_GGGTCTTGGT 83 2021-12-17 16:30:11,483 WARNING Alignment SRR9113885.640196_TGGGACTTTT 83 2021-12-17 16:30:14,593 WARNING Alignment SRR9113885.736786_GTGTGTGGTG 83 2021-12-17 16:30:19,006 WARNING Alignment SRR9113885.821797_TGGTACTTTT 83 2021-12-17 16:30:25,211 WARNING Alignment SRR9113885.969358_GTGGGGTGCG 83 2021-12-17 16:30:47,795 WARNING Alignment SRR9113885.1108129_TTCTGGATGT 83 2021-12-17 16:30:47,795 WARNING Alignment SRR9113885.1108130_TTCTGGATGT 83 2021-12-17 16:30:47,796 WARNING Alignment SRR9113885.1108131_TCTGGATGTT 83 2021-12-17 16:30:47,797 WARNING Alignment SRR9113885.1108134_GGATGTTGTA 83 2021-12-17 16:30:47,799 WARNING Alignment SRR9113885.1108138_TGTCCATCTT 83 2021-12-17 16:30:50,814 WARNING Alignment SRR9113885.1166981_TGTTGTAGTC 83 2021-12-17 16:31:05,304 WARNING Alignment SRR9113885.1451517_GCTCCCTTAT 83 2021-12-17 16:31:06,167 WARNING Alignment SRR9113885.1473660_GTCCACTTTT 83 2021-12-17 16:31:06,167 WARNING Alignment SRR9113885.1473661_GTCCACTTTT 83 2021-12-17 16:31:43,303 WARNING Alignment SRR9113885.2203956_TGTGTGGGGT 83 2021-12-17 16:32:26,064 WARNING Alignment SRR9113885.3073150_TAGGGGTCTG 83 2021-12-17 16:32:26,588 WARNING Alignment SRR9113885.3083739_CTTGGGTCTT 83 2021-12-17 16:32:47,692 WARNING Alignment SRR9113885.3431410_GCCTTGACAT 83 2021-12-17 16:32:47,693 WARNING Alignment SRR9113885.3431411_GACATTCTCA 83 2021-12-17 16:32:47,693 WARNING Alignment SRR9113885.3431413_AGCTCCACTT 83 2021-12-17 16:32:53,268 WARNING Alignment SRR9113885.3511521_CTGCCCACGA 83 2021-12-17 16:32:53,874 WARNING Alignment SRR9113885.3523928_GTCCAGCTGT 83 2021-12-17 16:32:57,301 INFO Total pairs output: 2734136, Pairs skipped _sorted.out.bam' ENST00000417570 211 has no mate -- skipped ENST00000652551 2697 has no mate -- skipped ENST00000480246 1934 has no mate -- skipped ENST00000652551 2320 has no mate -- skipped ENST00000536769 3536 has no mate -- skipped ENST00000538617 1134 has no mate -- skipped ENST00000536769 3250 has no mate -- skipped ENST00000339647 1846 has no mate -- skipped ENST00000619423 7081 has no mate -- skipped ENST00000583866 4197 has no mate -- skipped ENST00000567736 2148 has no mate -- skipped ENST00000617010 6393 has no mate -- skipped ENST00000567736 1821 has no mate -- skipped ENST00000339647 1777 has no mate -- skipped ENST00000536769 3059 has no mate -- skipped ENST00000339647 1546 has no mate -- skipped ENST00000536769 3281 has no mate -- skipped ENST00000339647 1748 has no mate -- skipped ENST00000339647 859 has no mate -- skipped ENST00000568624 2465 has no mate -- skipped ENST00000378024 13068 has no mate -- skipped ENST00000378024 13068 has no mate -- skipped ENST00000567736 1941 has no mate -- skipped ENST00000567736 2259 has no mate -- skipped ENST00000536769 3191 has no mate -- skipped ENST00000536769 3184 has no mate -- skipped ENST00000536769 3175 has no mate -- skipped ENST00000339647 1644 has no mate -- skipped ENST00000513405 225 has no mate -- skipped ENST00000339647 1968 has no mate -- skipped - no mates: 30, Pairs skipped - not read1 or 2: 0

job finished in 208 seconds at Fri Dec 17 16:32:57 2021 -- 208.75 3.53 0.00 0.00 -- 308eea8d-1f69-413c-bc58-169b39fd0c7c

  I find those warning messages confusing because I
    made sure to remove singleton reads, but anyway, I ended with
    the ready-for-rsem.out.bam file i wanted!
  I launched then rsem-calculate-expression
      -paired-end --num-threads 15 --temporary-folder tmp/
      --alignments ready_for_rsem.out.bam RSEM/GRCh38_ref final
    and redirected the standard output to a log file because it was
    really long, but the end message was
  rsem-run-em: RefSeq.h:85: int RefSeq::get_id(int, int) const: Assertion `pos >= 0 && pos < totLen' failed.

  To gain a little bit of insight on the difference of
    the files before and after umi_tools prepare-for-rsem,
    here's the output of samtools flagstat of both
    input and output files of the command:
  $samtools flagstat ready_for_rsem.out.bam

5468272 + 0 in total (QC-passed reads + QC-failed reads) 2086706 + 0 secondary 0 + 0 supplementary 0 + 0 duplicates 5468272 + 0 mapped (100.00% : N/A) 3381566 + 0 paired in sequencing 1690783 + 0 read1 1690783 + 0 read2 3381566 + 0 properly paired (100.00% : N/A) 3381566 + 0 with itself and mate mapped 0 + 0 singletons (0.00% : N/A) 0 + 0 with mate mapped to a different chr 0 + 0 with mate mapped to a different chr (mapQ>=5)

$ samtools flagstat SRR9113885_dedup_sorted.out.bam 3707176 + 0 in total (QC-passed reads + QC-failed reads) 1870528 + 0 secondary 0 + 0 supplementary 0 + 0 duplicates 3707176 + 0 mapped (100.00% : N/A) 1836648 + 0 paired in sequencing 918339 + 0 read1 918309 + 0 read2 1836648 + 0 properly paired (100.00% : N/A) 1836648 + 0 with itself and mate mapped 0 + 0 singletons (0.00% : N/A) 0 + 0 with mate mapped to a different chr 0 + 0 with mate mapped to a different chr (mapQ>=5)

  Please find attached in this Google Drive folder
    the following files:

    SRR9113885_Aligned.toTranscriptome.out.bam :
      The initial BAM file as it is outputed by STAR
    SRR9113885_dedup_sorted.out.bam : The processed
      BAM file after sorting, indexing, deduplicating, and name
      sorting, as the input for prepare-for-rsem
    ready_for_rsem.out.bam : The BAM file that is
      outputed from prepare-for-rsem and given to rsem
        calculate-expression
    rsem.log : The outputed messages RSEM gives

  As always, thank you very much for your time and
    help! :)
  —
    Reply to this email directly, view it on GitHub, or unsubscribe.
    Triage notifications on the go with GitHub Mobile for iOS or Android.

    You are receiving this because you were mentioned.Message
      ID: ***@***.***>
  [

{ @.": "http://schema.org", @.": "EmailMessage", "potentialAction": { @.": "ViewAction", "target": "https://github.com/CGATOxford/UMI-tools/issues/465#issuecomment-996854807", "url": "https://github.com/CGATOxford/UMI-tools/issues/465#issuecomment-996854807", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { @.": "Organization", "name": "GitHub", "url": "https://github.com" } } ]

--

Aviv De Morgan, Ph.D Head of Bioinformatics Pyxis Diagnostics High Tech Village Givat-Ram, POB 39158 Jerusalem 91391 Israel +972-2-6553333 @.***

CGATOxford / UMI-tools

Problems with dedup output and RSEM #465

UMI-tools version: 1.1.2

stdlog : <_io.TextIOWrapper

UMI-tools version: 1.1.2

output generated by prepare-for-rsem -I SRR9113885_dedup_sorted.out.bam --stdout=ready_for_rsem.out.bam

job started at Thu Dec 9 12:22:41 2021 on flomics-ThinkPad-L580 -- ddae99d7-c386-42a5-84d7-8f7a78caf60d

pid: 340867, system: Linux 5.4.0-91-generic #102-Ubuntu SMP Fri Nov 5 16:31:28 UTC 2021 x86_64

compresslevel : 6

log2stderr : False

loglevel : 1

random_seed : None

sam : False

short_help : None

stderr : <_io.TextIOWrapper name='' mode='w' encoding='utf-8'>

stdin : <_io.TextIOWrapper name='SRR9113885_dedup_sorted.out.bam' mode='r' encoding='UTF-8'>

stdlog : <_io.TextIOWrapper name='' mode='w' encoding='utf-8'>

stdout : <_io.TextIOWrapper name='ready_for_rsem.out.bam' mode='w' encoding='UTF-8'>

tags : UG,BX

timeit_file : None

timeit_header : None

timeit_name : all

tmpdir : None

UMI-tools version: 1.1.2

output generated by prepare-for-rsem -I SRR9113885_dedup.out.bam --stdout=ready_for_rsem.out.bam

job started at Thu Dec 9 12:23:45 2021 on flomics-ThinkPad-L580 -- c632d846-0c10-4602-90d4-23cd6ca6f494

pid: 340936, system: Linux 5.4.0-91-generic #102-Ubuntu SMP Fri Nov 5 16:31:28 UTC 2021 x86_64

compresslevel : 6

log2stderr : False

loglevel : 1

random_seed : None

sam : False

short_help : None

stderr : <_io.TextIOWrapper name='' mode='w' encoding='utf-8'>

stdin : <_io.TextIOWrapper name='SRR9113885_dedup.out.bam' mode='r' encoding='UTF-8'>

stdlog : <_io.TextIOWrapper name='' mode='w' encoding='utf-8'>

stdout : <_io.TextIOWrapper name='ready_for_rsem.out.bam' mode='w' encoding='UTF-8'>

tags : UG,BX

timeit_file : None

timeit_header : None

timeit_name : all

tmpdir : None

UMI-tools version: 1.1.2

output generated by prepare-for-rsem -I SRR9113885_dedup_sorted.out.bam --stdout=ready_for_rsem.out.bam

job started at Fri Dec 17 16:29:28 2021 on ip-172-31-20-55 -- 308eea8d-1f69-413c-bc58-169b39fd0c7c

pid: 9532, system: Linux 5.11.0-1022-aws #23~20.04.1-Ubuntu SMP Mon Nov 15 14:03:19 UTC 2021 x86_64

compresslevel : 6

log2stderr : False

loglevel : 1

random_seed : None

sam : False

short_help : None

stderr : <_io.TextIOWrapper name='' mode='w' encoding='utf-8'>

stdin : <_io.TextIOWrapper name='SRR9113885_dedup_sorted.out.bam' mode='r' encoding='UTF-8'>

stdlog : <_io.TextIOWrapper name='' mode='w' encoding='utf-8'>

stdout : <_io.TextIOWrapper name='ready_for_rsem.out.bam' mode='w' encoding='UTF-8'>

tags : UG,BX

timeit_file : None

timeit_header : None

timeit_name : all

tmpdir : None

job finished in 208 seconds at Fri Dec 17 16:32:57 2021 -- 208.75 3.53 0.00 0.00 -- 308eea8d-1f69-413c-bc58-169b39fd0c7c