Closed gmarcais closed 2 years ago
We should also see how the solution here interacts with / duplicates / compliments what is going on in #26.
PR #26 uses the incorrectly mated reads (2 cases: missing mates and mis-ordered mates) as single reads. This PR on the other hand fails with a loud message with an explanation.
The question boils down to what to do when we encounter a poorly formatted BAM: silently handle the data as best as it can, or fail early to avoid unexpected behavior. This PR implements the latest.
Maybe each type of formatting deserves a different treatment?
I had to do the following to get 7aa09c2 to compile (rust complained that there were 2 versions of bio_types [0.12.1 and 0.13.0].
rustup show
Default host: x86_64-unknown-linux-gnu
rustup home: /nfs/scistore16/itgrp/jelbers/.rustup
installed toolchains
--------------------
stable-x86_64-unknown-linux-gnu (default)
nightly-x86_64-unknown-linux-gnu
active toolchain
----------------
stable-x86_64-unknown-linux-gnu (default)
rustc 1.62.1 (e092d0b6b 2022-07-16)
git clone https://github.com/OceanGenomics/mudskipper
cd mudskipper/
git fetch origin pull/29/head:29
git checkout 29
git checkout 7aa09c2
perl -pi -e 's/bio-types = "0.12.1"/bio-types = "0.13.0"/' Cargo.toml
cargo build --release
I must say that the skip option is handy (if STAR was run with --outSAMtype BAM Unsorted
) as downstream output might not work with Salmon if one uses a mixture of paired and unpaired reads generated with #26.
Ok; things look good here. I'm going to go ahead with the merge. Thank you @yfei-w!