databio / bedbase

Aggregate, analyze, and serve genomic regions.
http://bedbase.org/
4 stars 0 forks source link

Bedqc: bedboss should automatically ignore bedqc-filtered BED files #27

Closed xuebingjie1990 closed 4 months ago

xuebingjie1990 commented 1 year ago

Currently, bedqc flags bed files that either 1) are larger than 2G, 2) have over 5 million regions, or 3) have mean region width less than 10 bp. It requires a manual step to either remove the flagged files or keep them for the downstream process.

Instead, we should improve the filters and add functions to deal with the flagged files so bedqc can be part of the automated process.

nsheff commented 1 year ago

The result should be reported using pipestat so that the final output statistics table shows which files are not passing QC.

It should also be possible to find out why they are not passing QC.

khoroshevskyi commented 10 months ago

Right now bedqc will raise error when you QC is not pass: https://github.com/databio/bedboss/blob/948f5642ce8acc60e836481e984dd57fdb9a89e5/bedboss/bedqc/bedqc.py#L104

bedboss should create, or open csv file that will report why file didn't pass QC. I think it is already done: https://github.com/databio/bedboss/blob/948f5642ce8acc60e836481e984dd57fdb9a89e5/bedboss/bedqc/bedqc.py#L94-L102