Closed LYC-vio closed 1 year ago
Hi @LYC-vio, thank you for posting this question.
The Cue framework currently provides an extensively trained model and out-of-the-box support only for short reads. The Cue-long model is a separate proof-of-concept preliminary model trained to demo the extensibility of the framework to another technology — we trained and evaluated this model only on limited synthetic data to show how the framework can be extended to achieve strong performance with different input types as described in the “Extending Cue” and “Discussion” sections of the manuscript (more information about this benchmark/model/repro is also available in our cue-synth-datasets GCS bucket; guidelines for how the framework can be extended to custom technologies is available in the "extensions.ipynb" notebook). This model is not yet intended for use on real data — much more extensive training and evaluation is needed to deploy it on real genomes (similar to the short-read strategy described in the paper) — but we’re working on this now and will release new models and full-support for more technologies soon! I’ll add further clarification to the README as well.
Thanks, V
Hi @viq854 ,
Sorry for open this issue again. Just a quick question about when will Cue support long-read data.
Thank you again for your time and efforts
Best regards Yichen
Hi @LYC-vio,
Planning to release fully trained models for PacBio sometime by the end of the summer.
Best, V
@viq854 Hi! Wondering how is going on with PacBio?
Hi,
Thank you for developing this excellent tool. I've recently tried to use Cue to call SVs on a long-read data BAM file (NA24385_Pacbio_CLR_SRX7668835, aligned to hg19 using minimap2), but got an empty VCF output with no error reported.
In the logging info there were lines saying that no intervals where selected:
However I have no idea what might cause this issue.
I also noticed that you used Cue-long to run on the CLR data in your paper, did that refer to another version of Cue or there were additional settings required in the yaml configuration for long reads?
Thank you
Best, Yichen
Here's the detailed configuration I used in my run:
The BAM file was generated with: