bgruening / galaxytools

:microscope::books: Galaxy Tool wrappers
MIT License
116 stars 227 forks source link

Compound conversion: Add index to output #1189

Open hechth opened 2 years ago

hechth commented 2 years ago

The compound conversion tool which is part of the chemical toolbox doesn't handle indices etc. for the files which it processes and silently drops lines that are invalid - this makes working with larger files problematic, as the output format can no more be associated with the inputs.

Is there a way to add indices to the files to indicate which output belongs to which input or is the only option to run collections and have one identifier per job?

bgruening commented 2 years ago

@hechth you are talking about that tool? https://github.com/bgruening/galaxytools/blob/master/chemicaltoolbox/openbabel/ob_convert.xml

Which input format are you using?

hechth commented 2 years ago

@bgruening Indeed!

I'm using a normal list, so the inchi format how it is called.

Some example data is attached. inchi.zip

bgruening commented 2 years ago

Can you try adding an additional column (https://usegalaxy.eu/root?tool_id=toolshed.g2.bx.psu.edu/repos/devteam/add_value/addValue/1.0.0) to the inchi file? Is that preserved by openbabel?

hechth commented 2 years ago

I tried using 2 columns separated with ,, that didn't change anything on the specific history (https://umsa.cerit-sc.cz/u/hechth/h/compound-convert-test).

bgruening commented 2 years ago

try adding a new column with a tab using the tool from above

hechth commented 2 years ago

Nope - tried adding a column manually, using tabs, commas, the Galaxy tool, but always the same - no index in the output and invalid data gets dropped silently.

bgruening commented 2 years ago

Maybe @simonbray has an idea? This tool is using simply openbabel, so if openbabl can not deal with this I think we are out of luck here.

simonbray commented 2 years ago

Can you use a different file format? I think inchi is in general not a good choice for the input.

With smiles or sdf you can specify the index in the molecule name/title.

hechth commented 2 years ago

I explicitly want the inchi, since I want to compute smiles from inchi.

I also don't get why indexing is possible with SMILES and not with inchi? They're both just texts ...

simonbray commented 2 years ago

I also don't get why indexing is possible with SMILES and not with inchi? They're both just texts ...

What I meant is that SMILES has a name/title/label which you can append a index to.

I explicitly want the inchi, since I want to compute smiles from inchi.

I think as @bgruening said we are limited by the underlying software. Maybe you can use a Galaxy workaround like this? https://usegalaxy.eu/u/sbray/h/inchi-index

hechth commented 2 years ago

In this scenario the join works as the inchi doesn't change - but if we actually change specific parts of it, they are no more identical, so the workaround doesn't function.

If I come up with a solution, should I just PR it here? Otherwise, I think I could solve our specific needs with a targeted tool.

Thank you very much for your support and for looking into this!

simonbray commented 2 years ago

Yes, PRs are always welcome, thanks!