PatWalters / rd_filters

A script to run structural alerts using the RDKit and ChEMBL
MIT License
124 stars 36 forks source link

more unixish behavior #16

Open UnixJunkie opened 2 years ago

UnixJunkie commented 2 years ago

Hello,

The default behavior should be the filter command (because the other commands should only rarely be used). -i: input file -o: output file, same file format than the input (i.e. molecules that passed all filters). Filtered out molecules should be sent to stderr, or at least go to .errors (but this is less unixish).

I could contribute this if you are interested, but I need to look at the code and code quality to see if this can easily be done.

Regards, F.

PatWalters commented 2 years ago

I agree that the command line interface needs to be revamped. What about something like this? It would do away with the config file and allow users to specify variations from the defaults on the command line.

Usage:
rd_filters -i INPUT_FILE -o OUTPUT_FILE [--alert ALERT_SET][--cfg CONFIG_FILE][--rdalert ALERT_FILE][--np NUM_CORES]
[--mw MW_LIMIT][--logp LOGP_LIMIT][--hbd HBD_LIMIT][--hba HBA_LIMIT][--tpsa TPSA_LIMIT][--filter FILTER_NAME]
rd_filters --wrconfig CONFIG_FILE
rd_filters --show

Options:
-i --in INPUT_FILE input file name
-o --out OUTPUT_FILE output file name, rule failures go to OUTPUT_FILE.err
--alert ALERT_SET_NAME specify structural alert set to use 
--cfg CONFIG_FILE read a configuration file with settings
--rdalert ALERT_FILE read structural alerts from a file
--np NUM_CORES the number of cpu cores to use (default is all)
--mw MW_LIMIT molecular weight limit, input in quotes, default "0 500"
--logp LOGP_LIMIT logp limit, input in quotes, default "0 5"
--hbd HBD_LIMIT hydrogen bond donor limit, input in quotes, default "0 5"
--hba HBA_LIMIT, hydrogen bond acceptor limit, input in quotes, default "0 10"
--rot ROTOR_LIMIT, rotatable bond limit, input in quotes, default "0 10"
--tpsa TPSA_LIMIT, topological polar surface area limit, 
--wrconfig CONFIG_FILE write a configuration file with the defaults settings
--show  show available structural alert sets
UnixJunkie commented 2 years ago

I think it's great. On Unix: {-i|--input}, {-o|--output}; yes, openeye has it wrong too. Default input should be stdin; default output stdout. Importantly: we must accept eventually several --alert options on the command-line (maybe: {-a,--alerts}).

PatWalters commented 2 years ago

Updated, I'll create a fork this weekend.

Usage:
rd_filters [-i INPUT_FILE] [-o OUTPUT_FILE] [-a ALERT_SET][--cfg CONFIG_FILE][--rdalert ALERT_FILE][--np NUM_CORES]
[--mw MW_LIMIT][--logp LOGP_LIMIT][--hbd HBD_LIMIT][--hba HBA_LIMIT][--tpsa TPSA_LIMIT][--filter FILTER_NAME]
rd_filters --wrconfig CONFIG_FILE
rd_filters --show

Options:
-i --input INPUT_FILE input file name, defaults to stdin
-o --output OUTPUT_FILE output file name, rule failures go to OUTPUT_FILE.err, defaults to stdout and stderr
-a --alert ALERT_SET_NAME specify structural alert set to use, multiple sets can be specified in quotes e.g. "bms glaxo"
--cfg CONFIG_FILE read a configuration file with settings
--rdalert ALERT_FILE read structural alerts from a file
--np NUM_CORES the number of cpu cores to use (default is all)
--mw MW_LIMIT molecular weight limit, input in quotes, default "0 500"
--logp LOGP_LIMIT logp limit, input in quotes, default "0 5"
--hbd HBD_LIMIT hydrogen bond donor limit, input in quotes, default "0 5"
--hba HBA_LIMIT, hydrogen bond acceptor limit, input in quotes, default "0 10"
--rot ROTOR_LIMIT, rotatable bond limit, input in quotes, default "0 10"
--tpsa TPSA_LIMIT, topological polar surface area limit, default "0 200"
--wrconfig CONFIG_FILE write a configuration file with the defaults settings
--show  show available structural alert sets