databio / pypiper

Python toolkit for building restartable pipelines
http://pypiper.databio.org
BSD 2-Clause "Simplified" License
45 stars 9 forks source link

Pypiper should support `force_overwrite` when reporting results? #201

Closed donaldcampbelljr closed 6 months ago

donaldcampbelljr commented 9 months ago

I noticed that, when managing pipelines via pypiper, if the results already exist, the pipeline will fail. This is because pypiper uses pipestat to report and does not use pipestat.report's force_overwrite parameter: https://github.com/databio/pypiper/blob/8aaede5d75f374fd475573dfd875e3d742c581ea/pypiper/manager.py#L1686-L1692

Example of error (while using PEPATAC):

These results exist for 'DEFAULT_SAMPLE_NAME': Read_type
Traceback (most recent call last):
  File "/home/drc/pepatac_tutorial//tools/pepatac/pipelines/pepatac.py", line 2779, in <module>
    sys.exit(main())
  File "/home/drc/pepatac_tutorial//tools/pepatac/pipelines/pepatac.py", line 731, in main
    pm.report_result("Read_type", args.single_or_paired)
  File "/home/drc/anaconda3/envs/pepatac38/lib/python3.8/site-packages/pypiper/manager.py", line 1615, in report_result
    for r in reported_result:
TypeError: 'bool' object is not iterable

pipestat.report returns False if it cannot report the result which then causes an issue in line 1691.

Solution: -create a new parameter in pypiper that allows the user to toggle force_overwrite and default it to False. -implement handling a False value instead of crashing pypiper.

donaldcampbelljr commented 9 months ago

Example of output with suggested solution:

These results exist for 'DEFAULT_SAMPLE_NAME': Fragment distribution
Result successfully reported? False