arq5x / poretools

a toolkit for working with Oxford nanopore data
MIT License
239 stars 90 forks source link

Docs or help : where to find the choices options ? and Possible source of crash. #154

Open ghost opened 7 years ago

ghost commented 7 years ago

Hi,

Thanks for this tools which is very useful.

I just have a remark. It would be great to add the possible choices within the help.

user@host$ poretools occupancy --help

usage: poretools occupancy [-h] [-q] [--saveas STRING] [--plot-type STRING]
                           FILES [FILES ...]

positional arguments:
  FILES               The input FAST5 files.

optional arguments:
  -h, --help          show this help message and exit
  -q, --quiet         Do not output warnings to stderr
  --saveas STRING     Save the plot to a file. Extension (.pdf or .png) drives
                      type.
  --plot-type STRING  The type of plot to generate

To know possible choices for --plot-type I had to open the following code :

    ############
    # yield_plot
    ############
    parser_occupancy = subparsers.add_parser('occupancy',
                                        help='Inspect pore activity over time for a set of FAST5 files')
    parser_occupancy.add_argument('files', metavar='FILES', nargs='+',
                               help='The input FAST5 files.')
    parser_occupancy.add_argument('--saveas',
                             dest='saveas',
                             metavar='STRING',
                             help='Save the plot to a file. Extension (.pdf or .png) drives type.',
                             default=None)
    parser_occupancy.add_argument('--plot-type',
                             dest='plot_type',
                             metavar='STRING',
                             choices=['read_count', 'total_bp'],
                             help='The type of plot to generate',
                             default='read_count')

    parser_occupancy.set_defaults(func=run_subtool)

Additionally it was a little hard to find the parser for the occupancy sub-command. Indeed it is commented as "yield_plot" and not "occupancy"

I also had troubles with some reads that have not been called (I don't know why yet, maybe 1D^2 complement because I used the command between the linear and the paired caller).

if fast5.is_open:
            fq = fast5.get_fastq()

            start_time = fast5.get_start_time()
            if start_time is None:
                logger.warning("No start time for %s!" % (fast5.filename))
                fast5.close()
                continue

Adding this in the code solved the problem. In addition you can have an idea of the pore occupancy before the end of the basecalling process like that.

            if fq is None:
                logger.warning("No fastq sequence for %s!" % (fast5.filename))
                fast5.close()
                continue

Finally, It may be related to me platform but I had to change the figsize in occupancy.py for the graph to be readable in pdf.

plt.savefig(plot_file, figsize=(20, 20))

Bests,