blachlylab / fade

Fragmentase Artifact Detection and Elimination
MIT License
11 stars 3 forks source link

fade stats-clip output documentation? #30

Open jon-nowacki opened 2 years ago

jon-nowacki commented 2 years ago

Is there description for these values? It's the output for fade stats-clip

Also, do you have a way to produce a histogram of the soft clip read lengths? It would be a great way to identify read quality coming off of the sequencing machine.

charlesgregory commented 2 years ago

No I don't have it explicitly documented anywhere.

As far as a histogram goes, I think creating the histogram could be done with bash tools.

This line below should extract the lengths of all the soft-clipped sequences from fade's stats-clip output:

cat fade_stats_clip_output.tsv | cut -f3 | awk '{ print length }' > soft_clipped.lengths.txt

Then by following this stack overflow question: https://stackoverflow.com/questions/39614454/creating-histograms-in-bash

If you save their script to the file hist.sh (I would also modify the bin size to be something like 3bp):

chmod +x hist.sh
./hist.sh  soft_clipped.lengths.txt

That could output what you need. It should output a column of bins and a column of bin sizes.

Though I haven't tested this yet. I could add a histogram-clips subcommand to fade, though it would yield similar results. If you would like to plot a histogram, that would be outside of fade's scope, but the stats-clip output has the data you need. Let me know if that helps at all.