Closed fdchevalier closed 2 years ago
Unfortunately, we are not currently updating this code and are not maintaining compatibility with newer versions of dependencies. You are welcome to branch the repo and make changes suitable with your needs. In practice, we find that single end reads are more than sufficient to infer ancestry in most cases.
Hi MSG team,
As mentioned in the previous issues (#46), I am using paired-end sequences. The alignment step treats these sequences as single-end sequences. Alignment accuracy is increased when considering them truly paired-end instead of single-end. As paired-end sequencing is now common, it would be relevant to take advantage of this and adapt the pipeline.
When playing with the pipeline, I identified scripts that required updates to achieve this. As I was using
bwa mem
, I added-p
option to thebwa mem
lines. https://github.com/JaneliaSciComp/msg/blob/e6fedcb88f51d7f7a876f7f2c62f18b96dd06e6b/msg.pl#L312 https://github.com/JaneliaSciComp/msg/blob/e6fedcb88f51d7f7a876f7f2c62f18b96dd06e6b/parse_and_map.py#L626 https://github.com/JaneliaSciComp/msg/blob/e6fedcb88f51d7f7a876f7f2c62f18b96dd06e6b/parse_and_map.py#L632This was just adapted to my needs. It will require more work if implemented: the other methods will need to be adapted too and single-end read alignment will need be retained.
The uses of paired-end alignment option has also consequences on the flags assigned to reads in SAM files. Instead of the
0
,4
and16
, a lot more flags must be considered. The only script that I think is affected isextract-ref-alleles.py
. Below are modifications I had to make. These are compatible with single-end reads as0
and16
are retained.https://github.com/JaneliaSciComp/msg/blob/e6fedcb88f51d7f7a876f7f2c62f18b96dd06e6b/extract-ref-alleles.py#L285
should be replaced with:
https://github.com/JaneliaSciComp/msg/blob/e6fedcb88f51d7f7a876f7f2c62f18b96dd06e6b/extract-ref-alleles.py#L332 should be replaced with:
https://github.com/JaneliaSciComp/msg/blob/e6fedcb88f51d7f7a876f7f2c62f18b96dd06e6b/extract-ref-alleles.py#L578-L579 should be replaced with:
I understand that this would require some efforts to upgrade the pipeline and is of a lesser priority compared to the previous issue. Anyway, if work is done on this aspect, I would be happy to make tests if needed.