Isoform2canonical mapper: data basis

ReneRanzinger commented 3 months ago

The isoform2canonical mapper from Preethi is not using any API to retrieve the sequence for isoforms or canonical proteins but rather should come with his own sequence data basis (not database, I actually think its a file or a folder of files):

Please check the location of the sequence data
Does the sequence data contain all of UniProt or just GlyGen sequences/proteins?
If the files are inside the docker container it would be good to move it into a volume that can be mounted by the container. That would allow the user of the container (e.g. GlyGen) to provide their own sequence data basis (in the right format and file/folder structure). It has the advantage that this container can also be user by other groups that deal with a different set of proteins and it moves the responsibility of keeping the sequence data basis updates away from the docker container towards the user of the container (e.g. GlyGen).

jieluo321 commented 2 months ago

The sequence data only contains the proteins of the proteomes in glygen. The docker image contains all these sequence files.

I can take a look in the future to separate sequence files from the java tool

ReneRanzinger commented 2 months ago

As @rykahsay mentioned an option would be a docker volume. Essentially a host folder that gets mounted into the container. For the Java program it still looks like as if it would read the file from its own file system. But it would allow the host to also access and update this files without the container even knowing it.

glygener / glygen-issues

Isoform2canonical mapper: data basis #1675