dan2097 / opsin

Open Parser for Systematic IUPAC Nomenclature. Chemical name to structure conversion
https://opsin.ch.cam.ac.uk
MIT License
135 stars 29 forks source link

Maven Central Javadoc MIT license Build Status

OPSIN - Open Parser for Systematic IUPAC Nomenclature

Version 2.8.0 (see ReleaseNotes.txt for what's new in this version)
Source code: https://github.com/dan2097/opsin
Web interface and informational site: https://opsin.ch.cam.ac.uk/
License: MIT License

OPSIN is a Java library for IUPAC name-to-structure conversion offering high recall and precision on organic chemical nomenclature.

Java 8 (or higher) is required for OPSIN 2.8.0

Supported outputs are SMILES, CML (Chemical Markup Language) and InChI (IUPAC International Chemical Identifier)

Simple Usage Examples

Convert a chemical name to SMILES

java -jar opsin-cli-2.8.0-jar-with-dependencies.jar -osmi input.txt output.txt
where input.txt contains chemical name/s, one per line

NameToStructure nts = NameToStructure.getInstance();
String smiles = nts.parseToSmiles("acetamide");

Convert a chemical name to CML

java -jar opsin-cli-2.8.0-jar-with-dependencies.jar -ocml input.txt output.txt
where input.txt contains chemical name/s, one per line

NameToStructure nts = NameToStructure.getInstance();
String cml = nts.parseToCML("acetamide");

Convert a chemical name to StdInChI/StdInChIKey/InChI with FixedH

java -jar opsin-cli-2.8.0-jar-with-dependencies.jar -ostdinchi input.txt output.txt
java -jar opsin-cli-2.8.0-jar-with-dependencies.jar -ostdinchikey input.txt output.txt
java -jar opsin-cli-2.8.0-jar-with-dependencies.jar -oinchi input.txt output.txt
where input.txt contains chemical name/s, one per line

NameToInchi nti = new NameToInchi()
String stdInchi = nti.parseToStdInchi("acetamide");
String stdInchiKey = nti.parseToStdInchiKey("acetamide");
String inchi = nti.parseToInchi("acetamide");

NOTE: OPSIN's non-standard InChI includes an additional layer (FixedH) that indicates which tautomer the chemical name described. StdInChI aims to be tautomer independent.

Advanced Usage

OPSIN 2.8.0 allows enabling of the following options:

The usage of these options on the command line is described in the command line's help dialog accessible via: java -jar opsin-cli-2.8.0-jar-with-dependencies.jar -h

These options may be controlled using the following code:

NameToStructure nts = NameToStructure.getInstance();
NameToStructureConfig ntsconfig = new NameToStructureConfig();
//a new NameToStructureConfig starts as a copy of OPSIN's default configuration
ntsconfig.setAllowRadicals(true);
OpsinResult result = nts.parseChemicalName("acetamide", ntsconfig);
String cml = result.getCml();
String smiles = result.getSmiles();
String stdinchi = NameToInchi.convertResultToStdInChI(result);

result.getStatus() may be checked to see if the conversion was successful. If a structure was generated but OPSIN believes there may be a problem a status of WARNING is returned. Currently this may occur if the name appeared to be ambiguous or stereochemistry was ignored. By default only optical rotation specification is ignored (this cannot be converted to stereo-configuration algorithmically).

Convenience methods like result.nameAppearsToBeAmbiguous() may be used to check the cause of the warning.

NOTE: (Std)InChI cannot be generated for polymers or radicals generated in combination with the wildcardRadicals option

Availability

OPSIN is available as a standalone JAR from GitHub, https://github.com/dan2097/opsin/releases

OPSIN is also available from the Maven Central Repository. For SMILES/CML output support you would include:

<dependency>
   <groupId>uk.ac.cam.ch.opsin</groupId>
   <artifactId>opsin-core</artifactId>
   <version>2.8.0</version>
</dependency>

or if you also need InChI output support:

<dependency>
   <groupId>uk.ac.cam.ch.opsin</groupId>
   <artifactId>opsin-inchi</artifactId>
   <version>2.8.0</version>
</dependency>

Building from source

To build OPSIN from source, download Maven 3 and OPSIN's source code.

Running mvn package in the root of OPSIN's source will build:

Artifact Location Description
opsin-cli-\<version>-jar-with-dependencies.jar opsin-cli/target Standalone command-line application with SMILES/CML/InChI support
opsin-core-\<version>-jar-with-dependencies.jar opsin-core/target Library with SMILES/CML support
opsin-inchi-\<version>-jar-with-dependencies.jar opsin-inchi/target Library with SMILES/CML/InChI support

About OPSIN

The workings of OPSIN are more fully described in:

Chemical Name to Structure: OPSIN, an Open Source Solution
Daniel M. Lowe, Peter T. Corbett, Peter Murray-Rust, Robert C. Glen
Journal of Chemical Information and Modeling 2011 51 (3), 739-753

If you use OPSIN in your work, then it would be great if you could cite us.

The following list broadly summarises what OPSIN can currently do and what will be worked on in the future.

Supported nomenclature includes:

Currently UNsupported nomenclature includes:

Developers and Contributors

Thanks also to the many users who have contributed through suggestions and bug reporting.

YourKit Logo

OPSIN's developers use YourKit to profile and optimise code.

YourKit supports open source projects with its full-featured Java Profiler. YourKit, LLC is the creator of YourKit Java Profiler and YourKit .NET Profiler, innovative and intelligent tools for profiling Java and .NET applications.

Good Luck and let us know if you have problems, comments or suggestions! Bugs may be reported on the project's issue tracker.