adnaniazi / tailfindr

An R package for estimating poly(A)-tail lengths in Oxford Nanopore RNA and DNA reads.
https://www.cbu.uib.no/valen/
GNU General Public License v3.0
53 stars 16 forks source link

Blank output when working with real data #30

Closed maximus-sci closed 2 years ago

maximus-sci commented 2 years ago

Hello,

I'd like to use your tool to calculate polyA tail lengths for my direct RNA sequencing.

I've installed tailfinder in a clean linux environment (Ubuntu 20.04) with the vbz plugin and the hdf5 library and can successfully run it on the test data you include with the package.

When I try to run it on my own fast5 files, however, I get a dataframe with no data aside from the location of the fast5 file and the proper number of rows.

The program appears to complete with no errors:

df <- find_tails(fast5_dir = "/mnt/d/nanoporeData/20220621_IVT_tailLengthCheck/called_fast5/test", 
save_dir = "/mnt/d/nanoporeData/20220621_IVT_tailLengthCheck/called_fast5/test",
csv_filename="restartTest.csv",
num_cores = 4)
── Started tailfindr (version 1.3) ───────────────────────────────────────────────────────────────────────────────────☰ You have configured tailfindr as following:
❯ fast5_dir:         /mnt/d/nanoporeData/20220621_IVT_tailLengthCheck/called_fast5/test
❯ save_dir:          /mnt/d/nanoporeData/20220621_IVT_tailLengthCheck/called_fast5/test
❯ csv_filename:      restartTest.csv
❯ num_cores:         4
❯ basecall_group:    Basecall_1D_000
❯ save_plots:        FALSE
❯ plot_debug_traces: FALSE
❯ plotting_library:  rbokeh
── Processing started at 2022-06-30 13:20:14 ───────────────────────────────────────────────────────────────────────────────────────
• Searching for all Fast5 files...
  Done! Found 1 Fast5 files.
• Analyzing a single Fast5 file to assess if your data
  is in an acceptable format...
  ✔ The data has been basecalled using Guppy.
  ✔ Flipflop model was used during basecalling.
  ✔ The reads are packed in multi-fast5 file(s).
  ✔ The experiment type is RNA, so we will search
    for poly(A) tails.
  ✔ The reads are 1D reads.
• Starting a parallel compute cluster...
  Done!
• Discovering reads in the 1 multifast5 files...
  Done! Found 4000 reads
• Searching for Poly(A) tails...
  Processing chunk 1 of 1
  |======================================================================| 100%
• Formatting the tail data...
  Done!
• Saving the data in the CSV file...
  Done! Below is the path of the CSV file:
  /mnt/d/nanoporeData/20220621_IVT_tailLengthCheck/called_fast5/test/restartTest.csv
• A logfile containing all this information has been saved in this path:
  /mnt/d/nanoporeData/20220621_IVT_tailLengthCheck/called_fast5/test/2022-06-30_13-20-14_tailfinder.log
── Processing ended at 2022-06-30 13:21:41 ─────────────────────────────────────────────────────────────────────────────────────────
✔ tailfindr finished successfully!
Warning message:
`cols` is now required when using unnest().
Please use `cols = c(read_id, tail_start, tail_end, samples_per_nt, tail_length,
    polya_fastq, file_path)`
> head(df)
# A tibble: 6 × 6
  read_id tail_start tail_end samples_per_nt tail_length file_path
  <lgl>   <lgl>      <lgl>             <dbl>       <dbl> <chr>
1 NA      NA         NA                   NA          NA /mnt/d/nanoporeData/20…
2 NA      NA         NA                   NA          NA /mnt/d/nanoporeData/20…
3 NA      NA         NA                   NA          NA /mnt/d/nanoporeData/20…
4 NA      NA         NA                   NA          NA /mnt/d/nanoporeData/20…
5 NA      NA         NA                   NA          NA /mnt/d/nanoporeData/20…
6 NA      NA         NA                   NA          NA /mnt/d/nanoporeData/20…
adnaniazi commented 2 years ago

Hi,

Thank you for using tailfindr.

I suspect that you have not installed VBZ plugin. Please install it and then test tailfindr on a small subset of fast5 files to see if it works or not.

Follow these instructions to install VBZ plugin: https://community.nanoporetech.com/posts/inspecting-fast5-from-flon

Adnan

maximus-sci commented 2 years ago

Thank you -- the plugin was installed but it the installation does not automatically tell the OS where it lives. Setting the HDF5_PLUGIN_PATH worked. For my system it was in /usr/local/hdf5/lib/plugin