Closed jacob-r-anderson closed 4 years ago
Closing issue. Had a mix of compounds with three columns that were excluded in this line:
input_data = [x for x in input_data if len(x) == 2]
Removed the redundant name column with:
awk '{print $1,$2}' input.smi > finput.smi
On a large smile file the program seems to end early (< 10% of the way through). (my-rdkit-env) [me]$ wc -l 01_split.smi 7113315 01_split.sm (my-rdkit-env) [Me]$ rd_filters filter --in 01_split.smi --prefix 01_filtered --rules rules.json --alerts alert.csv using 4 cores Using alerts from Inpharmatica and PAINS [09:05:36] Explicit valence for atom # 1 N, 5, is greater than permitted [09:06:02] Conflicting single bond directions around double bond at index 22. [09:06:02] BondStereo set to STEREONONE and single bond directions set to NONE. [09:06:42] Conflicting single bond directions around double bond at index 22. [09:06:42] BondStereo set to STEREONONE and single bond directions set to NONE. [09:07:08] Conflicting single bond directions around double bond at index 22. [09:07:08] BondStereo set to STEREONONE and single bond directions set to NONE. [09:07:56] Conflicting single bond directions around double bond at index 22. [09:07:56] BondStereo set to STEREONONE and single bond directions set to NONE. Wrote SMILES for molecules passing filters to 01_filtered.smi Wrote detailed data to 01_filtered.csv 13281 of 82704 passed filters 16.1% Elapsed time 197.91 seconds
I looked in the input file at lines above and below 82704 and nothing seems to be awry.
c1csc(c12)CCN([C@@H]2CC)C(=O)NC@Hc3c(C)nn(C)c3 316831704 316831704 - 82703 CCC(CC)C@@HC(=O)Nc(cn(n1)C)c1-c2ccnn2C 319220015 319220015 - 82704 n1cc(O)ccc1CC(=O)N(CC2=O)CCCN2CC 319374292 319374292 - 82705