BiodataAnalysisGroup / UMIc

A framework implementing a method for UMI deduplication and reads correction.
MIT License
8 stars 4 forks source link

Invalid class “ShortReadQ” object: sread() and id() length mismatch #10

Open HaiqiChenLab opened 1 week ago

HaiqiChenLab commented 1 week ago

Hi,

I encountered the following error message when trying to run UMIc:

Screenshot 2024-09-24 at 11 37 40 PM

Could you suggest a way to troubleshoot? Thanks!

npechl commented 1 week ago

Hi @HaiqiChenLab,

Thank you for your interest in UMIc. The error suggests that you might have a extra line in your input .fastq files.

Would it be possible to provide us with any extra information about them? For example how many reads you have? Are they gzipped files (.fastq.gz)?

Nikos

HaiqiChenLab commented 1 week ago

Thanks for your prompt reply, Nikos!

There are 439 reads in the file and it's zipped.

npechl commented 1 week ago

Hmm, I’m having trouble understanding the exact error causing the issue. It’s also quite odd that, despite having 439 reads, you’re getting 830/831. Do you believe it would be possible to provide us with an example file you're trying to run, or perhaps a subset of it?

I should also mention that UMIc started as a side project by a former colleague of mine. In its current form, UMIc operates as a collection of R scripts and has a few bugs that we need to address. We are, however, planning to develop it into a stand-alone R package to make it more robust and user-friendly.

HaiqiChenLab commented 1 week ago

What would be the best way to send you the file? It's about 400 KB in size. Thanks!

npechl commented 1 week ago

The file seems small, so feel free to email it to us at: inab.bioinformatics@lists.certh.gr. If it's too large for email, you can send a WeTransfer link instead.

Thank you in advance!

HaiqiChenLab commented 1 week ago

Thank you! Just sent you an email with the fastq file attached. Here's a dropbox link just in case: https://www.dropbox.com/scl/fi/mtf114rz39z2fkpc3gap5/output_1.fastq.gz?rlkey=v3d4x957ebhw6njknz42tpw4n&st=8zj18by9&dl=0

npechl commented 6 days ago

Hi again,

We have received the .fastq.gz file and I have already managed to run UMIc on your sequences.

As I previously mentioned, at the moment, UMIc runs as a collection of R scripts. That means that the user has to manually edit some of its input parameters (you will be able to find/edit these in UMIsProject.R script).

Following this, below are the input parameters I used for your sequences:

########## Inputs ##########

#type of data - paired or single
pairedData <- FALSE

#UMI located in Read1 --> "R1"
#UMI located in Read1 and Read2 --> "R1 & R2"
UMIlocation <- "R1"

#length of the UMI
UMIlength <- 12 #5

#length of th sequence
sequenceLength <- 22

#min read counts per UMI, for initial data cleaning
countsCutoff <- 5

#max UMI distance for UMI merging
UMIdistance <- 1

#max sequence distance for UMI correction
sequenceDistance <- 3

#inputs folder / working directory
inputsFolder <- "path/to/your/fastq/"

#outputs folder
outputsFolder <- "output/folder/"

The tool produced the output files in the specified output folder.

Nikos