BiodataAnalysisGroup / UMIc

A framework implementing a method for UMI deduplication and reads correction.
MIT License
8 stars 4 forks source link

long vectors not supported #9

Open Jappy0 opened 3 months ago

Jappy0 commented 3 months ago

Hi there,

Thanks for the good framework for UMI deduplication and read correction.

I could successfully run UMIc on some datasets containing several million reads, but when I ran it on a dataset with more than 10 million reads. I got the following error message.

Error in asMethod(object) : long vectors not supported yet: memory.c:3888
Calls: single -> as -> asMethod
Execution halted

Does this mean that UMIc cannot handle large-sized datasets?

Thanks for your time and attention.

Best regards,

Jappy

npechl commented 3 months ago

Hi @Jappy0,

Thank you for opening this issue!

Indeed UMIc cannot run on long vectors. One possible workaround involves dividing your dataset into smaller batches and running UMIc on each batch separately. However, we understand that this approach may not be suitable for all use cases.

Jappy0 commented 3 months ago

Hi @Jappy0,

Thank you for opening this issue!

Indeed UMIc cannot run on long vectors. One possible workaround involves dividing your dataset into smaller batches and running UMIc on each batch separately. However, we understand that this approach may not be suitable for all use cases.

Hi @npechl.

Ok, I see. One method suitable for all use cases is challenging.

Thanks for the reply. Have a good day.

Best regards,

Jappy