ComparativeGenomicsToolkit / taffy

This is a library C/Python/CLI for working with TAF (.taf,.taf.gz) and MAF (.maf) alignment files
MIT License
22 stars 3 forks source link

taffy view with a --region SEGVs when stdin as input #62

Open diekhans opened 1 month ago

diekhans commented 1 month ago
% ./bin/taffy view --region=Anc0refChr0:10-100 < tests/dupe_test.maf
#taf version:1 scoring:N/A
Segmentation fault (core dumped)

It would be useful to be able to select regions from a stream rather than an indexed taf.

benedictpaten commented 1 month ago

It should not seg fault, but this behavior would not be trivial to implement given the current design. I'd vote to "fix" by just printing a warning and exiting.

On Tue, Jul 16, 2024 at 4:46 PM Mark Diekhans @.***> wrote:

% ./bin/taffy view --region=Anc0refChr0:10-100 < tests/dupe_test.maf

taf version:1 scoring:N/A

Segmentation fault (core dumped)

It would be useful to be able to select regions from a stream rather than an indexed taf.

— Reply to this email directly, view it on GitHub https://github.com/ComparativeGenomicsToolkit/taffy/issues/62, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEQ4ICHEYBE3RP2R3XIULTZMWWEJAVCNFSM6AAAAABK7Q3VZGVHI2DSMVQWIX3LMV43ASLTON2WKOZSGQYTEMRTGEYDGNI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

diekhans commented 1 month ago

yes, really two different issues.

The use case pipeline for a stream is:

BTW, there now a Rust library to read bigBed files: https://github.com/jackh726/bigtools

which could be use to have taffy directly access bigMaf without the huge weight of the kent library.

Benedict Paten @.***> writes:

It should not seg fault, but this behavior would not be trivial to implement given the current design. I'd vote to "fix" by just printing a warning and exiting.

On Tue, Jul 16, 2024 at 4:46 PM Mark Diekhans @.***> wrote:

% ./bin/taffy view --region=Anc0refChr0:10-100 < tests/dupe_test.maf

taf version:1 scoring:N/A

Segmentation fault (core dumped)

It would be useful to be able to select regions from a stream rather than an indexed taf.

— Reply to this email directly, view it on GitHub https://github.com/ComparativeGenomicsToolkit/taffy/issues/62, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEQ4ICHEYBE3RP2R3XIULTZMWWEJAVCNFSM6AAAAABK7Q3VZGVHI2DSMVQWIX3LMV43ASLTON2WKOZSGQYTEMRTGEYDGNI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

-- Reply to this email directly or view it on GitHub: https://github.com/ComparativeGenomicsToolkit/taffy/issues/62#issuecomment-2232007527 You are receiving this because you authored the thread.

Message ID: @.***>

glennhickey commented 1 month ago

The crash is fixed in #62. Thanks for reporting it.

I guess the other issue is that you want region queries without the index. This is certainly possible and would be analogous to bcftools view -t (where the current taffy view -r matches bcftools view -r), but I don't personally have any plans to implement it anytime soon. In your use case, why not just pull the exact region out of the bigMaf?

diekhans commented 1 month ago

thanks

bigMafToMaf pull blocks, not regions. I wanted this trimmed to simplify down-stream analysis. In this case miRNAs.

I just wrote tmp taffy files, indexed then and then extract the region.

taffy did a beautiful job of creating single blocks.