Open PBGLMichaelHall opened 2 years ago
Poretools was created and developed at a time when fast5 files only had one read per file. Based on the file names, I'd guess you're looking at recent multi-fast5 files (probably from a Flongle), which have multiple reads per file. ONT does provide utilities in their github repostory to convert from one to another, but I expect you'll get a better outcome for what you want by looking directly at the read summary output from basecalling.
OK.... git clone https://github.com/nanoporetech/ont_fast5_api pip install ./ont_fast5_api
python multi_to_single_fast5.py -i path/to-multi-fast5/directory -s some/output/directory
poretools times /some/output/directory
WARNING:poretools:No start time for fast5.fast5! WARNING:poretools:No start time for fast5.fast5! WARNING:poretools:No start time for fast5.fast5! WARNING:poretools:No start time for fast5.fast5! . . . . It can find keyinfo now but not start times after converting from multi to single!
I need specific columns of data to be generated by poretools times which is not in the sequencing summary text file generated from a MINION run. These specific data names are read in by a python script. The following data names are what is not generated currently and what is actually needed. Is there a way to generate these data variables with sequencing summary without using poretools times?
exp_starttime unix_timestamp unix_timestamp_end iso_timestamp read_length day hour minute
A list of data variables the sequencing summary text file generates from a Minion Run;
filename read_id run_id batch_id channel mux start_time duration num_events passes_filtering template_start num_events_template template_duration sequence_length_template mean_qscore_template strand_score_template median_template mad_template scaling_median_template scaling_mad_template
I'll repeat that it's really not a great idea to use this old software for processing new data. It seems odd to need UNIX timestamp values (and derived values) for every single read.
ONT changed their time representation between different versions, and may have altered other things with FAST5 files. I think they changed from absolute time to relative time, so adding unix timestamp values would require fetching the experiment start time from the sequencing logs.
Or you could add a constant timestamp value of 1st January 2000 to everything, to make it really obvious that the timestamps are incorrect.
Completely agree that this is no longer the toolset to use here. I need to update the README and make it obvious that poretools is deprecated owing to all of the ONT changes.
Which version of poretools has the correct time representation (UNIX)?