OceanGenomics / mudskipper

A tool for projecting genomic alignments to transcriptomic coordinates
BSD 3-Clause "New" or "Revised" License
33 stars 7 forks source link

fix issue #24 #29

Closed gmarcais closed 2 years ago

gmarcais commented 2 years ago
rob-p commented 2 years ago

We should also see how the solution here interacts with / duplicates / compliments what is going on in #26.

gmarcais commented 2 years ago

PR #26 uses the incorrectly mated reads (2 cases: missing mates and mis-ordered mates) as single reads. This PR on the other hand fails with a loud message with an explanation.

The question boils down to what to do when we encounter a poorly formatted BAM: silently handle the data as best as it can, or fail early to avoid unexpected behavior. This PR implements the latest.

Maybe each type of formatting deserves a different treatment?

jelber2 commented 2 years ago

I had to do the following to get 7aa09c2 to compile (rust complained that there were 2 versions of bio_types [0.12.1 and 0.13.0].

rustup show

Default host: x86_64-unknown-linux-gnu
rustup home:  /nfs/scistore16/itgrp/jelbers/.rustup

installed toolchains
--------------------

stable-x86_64-unknown-linux-gnu (default)
nightly-x86_64-unknown-linux-gnu

active toolchain
----------------

stable-x86_64-unknown-linux-gnu (default)
rustc 1.62.1 (e092d0b6b 2022-07-16)
git clone https://github.com/OceanGenomics/mudskipper
cd mudskipper/
git fetch origin pull/29/head:29
git checkout 29
git checkout 7aa09c2
perl -pi -e 's/bio-types = "0.12.1"/bio-types = "0.13.0"/' Cargo.toml
cargo build --release
jelber2 commented 2 years ago

I must say that the skip option is handy (if STAR was run with --outSAMtype BAM Unsorted) as downstream output might not work with Salmon if one uses a mixture of paired and unpaired reads generated with #26.

rob-p commented 2 years ago

Ok; things look good here. I'm going to go ahead with the merge. Thank you @yfei-w!