databio / bedboss

Python pipeline for processing BED files for BEDbase
https://docs.bedbase.org
BSD 2-Clause "Simplified" License
1 stars 0 forks source link

bedboss should use bbclient! #36

Closed khoroshevskyi closed 5 months ago

khoroshevskyi commented 7 months ago

What if we utilize bbclient for caching files and then operate on pre-cached files instead of directly accessing locally stored files, copying, preprocessing, and then uploading to S3?

nsheff commented 7 months ago

Yes. we should first download bed file with bbclient, then use bbclient to load into bedboss.

if there are bed files that fail with bbclient that we want to use, then those cleaning steps should move from bedboss to bbclient. only the most basic ones; anything pipeline-related, or that creates a result to report, should stay in bedboss

bbclient could return a json with some annotation information when you try to ingest a bed file. it could say 'this bed file has x,y,z problems'. bedboss is then a user of bbclient, it gets this information and reports it with pipestat.

khoroshevskyi commented 5 months ago

fixed in 0.2.0