ComparativeGenomicsToolkit / paffy

Small C/CLI library of pairwise alignment format (paf) format tools used by Cactus
6 stars 2 forks source link

feature request paf filter #7

Open colindaven opened 4 months ago

colindaven commented 4 months ago

Hi,

I couldn't find any utilities for filtering pafs and this tool seems to be the most capable and toolkit-like.

Maybe it would be an option to put in a filter to avoid users having to hack together one liners in awk, like I did here.

# this is nextflow so contains escaped $

   awk '{if ((P=\$11/\$2*100) >= $params.min_align_length) {print P"\t"\$_}}' $paf

Filters might include

It might be easy to do this in awk, but it is a bit hacky and error prone.

Other repos do not seem to offer a paf filter either AFAIK

https://github.com/AndreaGuarracino/paf2chain

https://github.com/AndreaGuarracino/pafgnostic

https://github.com/ekg/pafplot

etc

Thanks

glennhickey commented 4 months ago

I agree that these are useful features to add.

In the meantime, you can consider using gaffilter. It's a little specific to minigraph-cactus, but I think does some of what you want (and despite its name, works on PAF as well as GAF).

colindaven commented 4 months ago

Hi @glennhickey thanks, that is useful.

I wrote something similar for gaf filtering once and noticed it at least partially worked on PAF too.