adnaniazi / tailfindr

An R package for estimating poly(A)-tail lengths in Oxford Nanopore RNA and DNA reads.
https://www.cbu.uib.no/valen/
GNU General Public License v3.0
53 stars 16 forks source link

Error in `list_unchop()` #62

Open Oscargblay opened 1 year ago

Oscargblay commented 1 year ago

Hi, I've been trying to estimate the length of poly(A) / poly(T) tails of cDNA molecules sequenced with ONT MiNION. The package that I used is the cDNA-PCR sequencing kit.

I did the basecalling with guppy using the following code:

"C:\Program Files\OxfordNanopore\ont-guppy-cpu\bin\guppy_basecaller.exe" --config dna_r9.4.1_450bps_fast.cfg --input_path "C:\Users\Nanopore\Desktop\tailfindr_test\original_fastq_pass" --recursive --save_path "C:\Users\Nanopore\Desktop\tailfindr_test\basecalling_output" --fast5_out --trim_strategy none --num_callers 1 --cpu_threads_per_caller 14 2>&1 | tee logfile.txt

The --config argument is set to dna_r9.4.1_450bps_fast.cfg according to the flow-cell and kit used for the sequencing. Then, I used the tailfindr package in R studio, with the following code:

df <- find_tails(fast5_dir = 'C:/Users/Nanopore/Desktop/tailfindr_test/test_subset/', #the fast5 files (output from guppy) are here
                 save_dir = 'C:/Users/Nanopore/Desktop/tailfindr_test/',
                 csv_filename = 'cdna_tails.csv',
                 num_cores = 8,
                 basecall_group = 'Basecall_1D_000', #See comment 1
                 save_plots = FALSE,
                 plotting_library = 'rbokeh')

This started the analysis, and even got to the end of the process after some days of computing:

" Processing chunk 1119 of 1119 |==========================================================================================================================================================================| 100%"

However, it gave the following error just after completion:

> • Formatting the tail data...
> Error in `list_unchop()`:
> ! Can't combine `x[[1]]` <character> and `x[[18]]` <integer>.
> Run `rlang::last_trace()` to see where the error occurred.
> There were 50 or more warnings (use warnings() to see the first 50)

These error look like this:

Warning messages:
1: In fun(result.5, result.2, result.8, result.4, result.3,  ... :
  number of columns of result is not a multiple of vector length (arg 4)
2: In fun(accum, result.103, result.99, result.104, result.105,  ... :
  number of columns of result is not a multiple of vector length (arg 14)
3: In fun(accum, result.200, result.201, result.202, result.203,  ... :
  number of columns of result is not a multiple of vector length (arg 25)
4: In fun(accum, result.299, result.301, result.300, result.303,  ... :
  number of columns of result is not a multiple of vector length (arg 3)
5: In fun(accum, result.399, result.398, result.401, result.400,  ... :
  number of columns of result is not a multiple of vector length (arg 6)
6: In fun(accum, result.497, result.498, result.499, result.500,  ... :
  number of columns of result is not a multiple of vector length (arg 10)

***Comment 1: I have tried both Basecall_1D_000 and Basecall_1D_001, but only 000 seems to work, as when using 001, the following error pops up:

"Error in `[[.H5File`(f5_obj, basecaller_path) : 
  An object with name /read_000a6cf7-6060-417c-8788-1a453011d6db/Analyses/Basecall_1D_001 does not exist in this group"

I was wondering if someone has faced this error before and might know how to solve it. Thank you in advance, Oscar

adnaniazi commented 10 months ago

Hi, I no longer maintain tailfindr. Dorado from ONT can now do polyA-tail estimation on both RNA and cDNA now. So I would encourage you to use it instead of tailfindr. Best, Adnan