Closed breakthewall closed 2 years ago
Can you please prove some additional information about the nature of the problem? What tools are you using that depend on pySBOL in this way? In general, pySBOL2 should be able to used as a drop-in replacement for pySBOL, and pySBOL2 is being actively maintained while pySBOL is not.
Many thanks for your (very) fast reply.
Actually we are developing a suite of tools for Synthetic Biology (we asked for the creation of such a category under the main toolshed of Galaxy). Among these tools, some of them have been developed with pySBOL (PartsGenie, doebase, LCRGenie, DNAWeaver). I tried to replace sbol import by sbol2 but it is not straightforward and code have to be modified. Since some of these codes are not under development anymore, we succeeded to use them with pySBOL.
At the end of the workflow, doebase works under Linux and macOS. However, sometimes output file (SBOL) is different and generates errors with downstream tools (LCRGenie and DNAWeaver). Actually, errors on both are the same.
So I had a look into doebase tool and I saw a different behavior depending on the OS: works good on macOS and fails on Galaxy instance Debian.
Taking a closer look, I observed that the output file (SBOL) of doebase is well-formed (no error with downstream tools) when it has been ran under macOS and Linux Ubuntu but is somehow malformed (error with downstream tools) when it has been ran under Linux Debian or CentOS.
Then, I have isolated the piece of code which has different behavior and it turns out that the method PartShop::pull
does not affect the Document on some (not all) parts (an example is given in my previous post). So there are some missing sequences in the output file which that generates some error afterwards.
I checked Python versions and conda environments which are strictly the same in both Debian and Ubuntu:
_libgcc_mutex 0.1 conda_forge conda-forge
_openmp_mutex 4.5 1_gnu conda-forge
ca-certificates 2021.10.8 ha878542_0 conda-forge
certifi 2021.10.8 pypi_0 pypi
charset-normalizer 2.0.10 pypi_0 pypi
distro 1.6.0 pypi_0 pypi
idna 3.3 pypi_0 pypi
ld_impl_linux-64 2.36.1 hea4e1c9_2 conda-forge
libffi 3.4.2 h7f98852_5 conda-forge
libgcc-ng 11.2.0 h1d223b6_11 conda-forge
libgomp 11.2.0 h1d223b6_11 conda-forge
libnsl 2.0.0 h7f98852_0 conda-forge
libstdcxx-ng 11.2.0 he4da1e4_11 conda-forge
libzlib 1.2.11 h36c2ea0_1013 conda-forge
ncurses 6.2 h58526e2_4 conda-forge
openssl 3.0.0 h7f98852_2 conda-forge
pip 21.3.1 pyhd8ed1ab_0 conda-forge
pysbol 2.3.3.post9 pypi_0 pypi
python 3.8.12 hf930737_2_cpython conda-forge
python_abi 3.8 2_cp38 conda-forge
readline 8.1 h46c0cb4_0 conda-forge
requests 2.27.1 pypi_0 pypi
setuptools 60.5.0 py38h578d9bd_0 conda-forge
sqlite 3.37.0 h9cd32fc_0 conda-forge
tk 8.6.11 h27826a3_1 conda-forge
urllib3 1.26.8 pypi_0 pypi
wheel 0.37.1 pyhd8ed1ab_0 conda-forge
xz 5.2.5 h516909a_1 conda-forge
zlib 1.2.11 h36c2ea0_1013 conda-forge
@bbartley Do you think this is a wheel issue? Is there any good way to tweak that?
@breakthewall Is there any ability to build from source instead of installing via pip in your environment? That might be a way to get around it too.
The problem appears to be that the user can't fetch from Synbiohub on CentOS. My hunch is that the HTTPS protocol used by Synbiohub is not supported on this system. Because some linux systems lack built-in crypto libraries (and CentOS might be one of these), some old wheels do not support HTTPS. In later wheel versions, I started statically linking crypto libraries into the pysbol binaries to get around this problem.
@breakthewall Prior to calling pull
, can you call this method to enable logging?
pysbol.Config.setOption('verbose', True)
See if this confirms my hypothesis above, or provides any further hints about what the issue might be.
Also, can you determine which precise wheels are being installed in your different environments?
pip show pysbol
I don't have much hope that there is an easy fix here, so most likely we will have to find a workaround. Or figure out a migration path to the native pysbol2.
I also wonder: can these tools be configured to use a local file rather than one retrieved from SynBioHub? If so, a workaround could be to download the files from SynBioHub via a copy of pySBOL2 and then run against the files rather than downloading on the fly.
@jakebeal Thank you for your suggestion. Actually, we deploy tools on the Galaxy Tool Shed. Therefore, tools must be available as conda packages. As pySBOL is not available on anaconda.org, we already had to tweak to make it available on conda via one of our packages. Building form source could be useful for testing but not for deploying version.
One additional information is that Debian and Ubuntu tested were Docker containers but I don't see any problem with that.
@bbartley Below this is the result
igem.pull('BBa_R0010', doc)
Issuing get request:
https://synbiohub.org/public/igem/BBa_R0010/sbol
Issuing get request: https://synbiohub.org/public/igem/BBa_R0010/sbol
This is weird because a wget
download well the file on the same system (Debian Docker container).
@jakebeal ok I think this could a workaround but I prefer keep it as a last resort since I am not the developer of the code (we only integrate this tool) that worked at some point. I keep that tip as a backup. Thanks!
I just went digging in a little deeper and found something curious that might be important.
I see that doebase doesn't actually use pySBOL as distributed --- its call to PartShop is actually to an import of SbmlToSBOL. In SbmlToSBOL, pySBOL isn't actually being called either: instead, it's got its own wrapper on libsbol.
I see that doebase doesn't actually use SBOL anywhere other than in that one file, and all of the usage that I saw there looks compatible with pySBOL2. Can you try changing just that dependency and not the other tools?
I've got a version of doebase passing tests with pySBOL2 and have set up a pull request to merge it into doebase: https://github.com/pablocarb/doebase/pull/9
Making this work required correcting a minor bug in doebase, which was exercised only with pySBOL2 and not with pySBOL.
@breakthewall: I see that you have contributed to doebase. Do you have maintainer privileges over there, or do we need @pablocarb to review and approve the pull?
Many thanks for your work guys! However, as I said:
Basing code on sbol2 makes the tool running but generates errors on downstream tools (lcr_genie and dnaweaver_synbiocad) since each tool of the chain has been built on SBOL v1 (sbml2sbol, partsgenie_client, doebase, lcr_genie, dnaweaver_synbiocad). I just tried with your changes and it produces a SBOL file that causes an error into lcr_genie, that it is not the case with a file generated by doebase
under Ubuntu or macOS. A solution could be to migrate all tools to SBOL2 but it could be a huge work;
The problem below is nothing related with doebase:
>>> import sbol
>>> sbol.Config.setOption('verbose', True)
>>> doc = sbol.Document()
>>> igem = sbol.PartShop('https://synbiohub.org/public/igem')
>>> igem.pull('BBa_R0010', doc)
Issuing get request:
https://synbiohub.org/public/igem/BBa_R0010/sbol
Issuing get request: https://synbiohub.org/public/igem/BBa_R0010/sbol
>>> print(doc)
Attachment....................0
Collection....................0
CombinatorialDerivation.......0
ComponentDefinition...........0
Experiment....................0
Test..........................0
Implementation................0
Model.........................0
ModuleDefinition..............0
Sequence......................0
Analysis......................0
Build.........................0
Design........................0
SampleRoster..................0
Activity......................0
Agent.........................0
Plan..........................0
Annotation Objects............0
---
Total.........................0
I am not a maintainer of doebase, I have to make PR.
I think that there is no easy solution here --- either the tools need to be upgraded (and debugged) or the wheel needs to be debugged and rebuilt.
I suspect that the tool upgrade will be more sustainable: 1) These types of cross-system wheel differences are a major part of why we created pySBOL2 in the first place. 2) Since pySBOL2 is a drop-in replacement for pySBOL, any failure in a tool generally indicates a bug that has already been lurking in that tool that needed fixing.
doebase is now updated to pySBOL2: https://github.com/pablocarb/doebase/pull/9#event-5911951741
Where are the repositories for the other tools?
Thank you very much! Pablo said me it is ok with LCRGenie but I have still some issues. The workflow is the following:
sbml2sbol -> partsgenie_client -> partsgenie (server) -> doebase -> lcr_genie + dnaweaver_synbiocad
sbml2sbol: https://github.com/neilswainston/SbmlToSbol (maintainer) partsgenie_client: https://github.com/neilswainston/PartsGenieClient (maintainer) partsgenie (server): https://github.com/neilswainston/PartsGenie (maintainer) doebase: https://github.com/pablocarb/doebase (PR) lcr_genie: https://github.com/neilswainston/LCRGenie (maintainer) dnaweaver_synbiocad: https://github.com/brsynth/DNAWeaver_SynBioCAD (maintainer)
Ok so after multiple tests, Pablo (doebase developper) confirms that the output file of new version of doebase provides an error with LCRGenie.
Do you have a test case that demonstrates said error?
SBOL2
From the master branch (pysbol2
) of doebase
:
python -m doebase tests/data/input/lycopene.csv --sbol_file tests/data/input/lycopene.xml --func doeGetSBOL constructs.xml
Then, from LCRGenie
master branch:
python -m lcr_genie constructs.xml plan.xlsx
Got (with SBOL or SBOL2):
Traceback (most recent call last):
File "/Users/jherisson/opt/miniconda3/envs/lcr_genie/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/Users/jherisson/opt/miniconda3/envs/lcr_genie/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/Users/jherisson/github/LCRGenie/lcr_genie/__main__.py", line 44, in <module>
entry_point()
File "/Users/jherisson/github/LCRGenie/lcr_genie/__main__.py", line 27, in entry_point
part_seqs, construct_parts, construct_seqs = sbol_utils.parse(path=args.input)
File "/Users/jherisson/github/LCRGenie/lcr_genie/sbol_utils.py", line 59, in parse
for construct_name, parts in parts_per_construct
File "/Users/jherisson/github/LCRGenie/lcr_genie/sbol_utils.py", line 59, in <listcomp>
for construct_name, parts in parts_per_construct
File "/Users/jherisson/github/LCRGenie/lcr_genie/sbol_utils.py", line 58, in <listcomp>
(construct_name, ''.join([parts_seqs[part] for part in parts]))
KeyError: ‘P54978_10000_gene'
SBOL
From the stable branch (pysbol
) of doebase
:
python -m doebase tests/data/input/lycopene.csv --sbol_file tests/data/input/lycopene.xml --func doeGetSBOL constructs.xml
Then, from LCRGenie
master branch:
python -m lcr_genie constructs.xml plan.xlsx
OK
I believe I've found the problem in LCRGenie: it's currently assuming that when you iterate over a set-value feature, that the set will be sorted in alphabetical order. This is a fragile assumption that happens to work only because libSBOL happens to be writing in that order, reading in that order, and then nothing touched the document before it walked the features. pySBOL2 happens to not always give alphabetical order in this situation, and that broke its assumption.
I've got a branch set up with a fix that works locally for me, but seem to have something wrong in the conda setup for automated testing still: https://github.com/jakebeal/LCRGenie/tree/upgrade-to-pySBOL2
OK, got it worked out. There is a pull request set up for LCRGenie now: https://github.com/neilswainston/LCRGenie/pull/1
@breakthewall I've now got a pull request set up for you on dnaweaver_synbiocad as well: https://github.com/brsynth/DNAWeaver_SynBioCAD/pull/1 The upgrade needed here was identical to that of LCRGenie.
I'm impressed by your reactivity. I think LCRGenie and DNAWeaver_SynBioCAD have that part in common (copy/paste).
Tests (files in SBOL) run well for LCRGenie which means that it is still compliant with SBOL files with your modifications. However, I get the same error as before on new files (SBOL2) provided by doebase (new version)
Traceback (most recent call last):
File "/Users/jherisson/opt/miniconda3/envs/lcr_genie/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/Users/jherisson/opt/miniconda3/envs/lcr_genie/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/Users/jherisson/github/LCRGenie/lcr_genie/__main__.py", line 44, in <module>
entry_point()
File "/Users/jherisson/github/LCRGenie/lcr_genie/__main__.py", line 27, in entry_point
part_seqs, construct_parts, construct_seqs = sbol_utils.parse(path=args.input)
File "/Users/jherisson/github/LCRGenie/lcr_genie/sbol_utils.py", line 63, in parse
for construct_name, parts in parts_per_construct
File "/Users/jherisson/github/LCRGenie/lcr_genie/sbol_utils.py", line 63, in <listcomp>
for construct_name, parts in parts_per_construct
File "/Users/jherisson/github/LCRGenie/lcr_genie/sbol_utils.py", line 62, in <listcomp>
(construct_name, ''.join([parts_seqs[part] for part in id_sort(parts)]))
KeyError: 'P21683_10000_gene'
The file generated by doebase (SBOL2) is available here: constructs.xml
I've been working on this error now too... it looks like the problem is that when using pySBOL2, doebase is not actually connecting the sequences to the ComponentDefinition objects for the genes. It looks like the problem is somewhere in synbioParts._defineParts
. That function, unfortunately, silently swallows a lot of exceptions, so I suspect that there is a bug in there that is getting exercised by the change in libraries.
I need to work on other things for the rest of today; do you want to try to dig into that error?
I agree with your diagnosis, that was I felt. I know Pablo (doebase dev) is busy today with an important meeting (where I'm also involved) but I'm gonna try to dig into this issue.
First feedback, I do not understand why I have this behavior (lycopene.xml):
>>> import sbol2
>>> doc1 = sbol2.Document()
>>> doc1.read('tests/data/input/lycopene.xml')
>>> print(doc1)
Design........................0
Build.........................0
Test..........................0
Analysis......................0
ComponentDefinition...........87
ModuleDefinition..............0
Model.........................0
Sequence......................87
Collection....................0
Activity......................0
Plan..........................0
Agent.........................0
Attachment....................0
CombinatorialDerivation.......0
Implementation................0
SampleRoster..................0
Experiment....................0
ExperimentalData..............0
Annotation Objects............0
---
Total: .........................174
>>> doc2 = doc1.copy()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/jherisson/opt/miniconda3/envs/test_sbol/lib/python3.7/site-packages/sbol2/document.py", line 987, in copy
return super().copy(target_doc, target_namespace, version)
File "/Users/jherisson/opt/miniconda3/envs/test_sbol/lib/python3.7/site-packages/sbol2/identified.py", line 281, in copy
o_copy = o.copy(target_doc, target_namespace, version)
File "/Users/jherisson/opt/miniconda3/envs/test_sbol/lib/python3.7/site-packages/sbol2/identified.py", line 265, in copy
self.doc.add(new_obj)
File "/Users/jherisson/opt/miniconda3/envs/test_sbol/lib/python3.7/site-packages/sbol2/document.py", line 219, in add
' to Document. An object with this identity '
sbol2.sbolerror.SBOLError: (<SBOLErrorCode.SBOL_ERROR_URI_NOT_UNIQUE: 17>, 'Cannot add http://liverpool.ac.uk/ComponentDefinition/D5KXJ0_20000_gene/1 to Document. An object with this identity is already contained in the Document')
In short, I tried the code of doebase
:
>>> import sbol2
>>> doc1 = sbol2.Document()
>>> doc1.read('tests/data/input/lycopene.xml')
>>> print(doc1)
Design........................0
Build.........................0
Test..........................0
Analysis......................0
ComponentDefinition...........87
ModuleDefinition..............0
Model.........................0
Sequence......................87
Collection....................0
Activity......................0
Plan..........................0
Agent.........................0
Attachment....................0
CombinatorialDerivation.......0
Implementation................0
SampleRoster..................0
Experiment....................0
ExperimentalData..............0
Annotation Objects............0
---
Total: .........................174
>>> doc2 = sbol2.Document()
>>> doc1.copy('http://liverpool.ac.uk', doc2)
<sbol2.document.Document object at 0x7f9248421cd0>
>>> print(doc2)
Design........................0
Build.........................0
Test..........................0
Analysis......................0
ComponentDefinition...........87
ModuleDefinition..............0
Model.........................0
Sequence......................87
Collection....................0
Activity......................0
Plan..........................0
Agent.........................0
Attachment....................0
CombinatorialDerivation.......0
Implementation................0
SampleRoster..................0
Experiment....................0
ExperimentalData..............0
Annotation Objects............0
---
Total: .........................174
>>> for cd in doc1.componentDefinitions:
... print(cd.sequence)
...
http://examples.org/Sequence/D5KXJ0_20000_gene_seq/1
http://examples.org/Sequence/P21683_10000_gene_seq/1
http://examples.org/Sequence/D5KXJ0_20000_cds_seq/1
.
.
.
>>> for cd in doc2.componentDefinitions:
... print(cd.sequence)
None
None
None
None
.
.
.
Looks like there's a bug associated with the remapping of the namespace.
When I put in doc1.copy('http://liverpool.ac.uk', doc2)
, the identity is getting mapped into the default that has been set instead:
>>> doc1.componentDefinitions[0].identity
Out[31]: 'http://liverpool.ac.uk/ComponentDefinition/P21684_10000_gene/1'
>>> doc2.componentDefinitions[0].identity
Out[32]: 'http://synbiochem.co.uk/ComponentDefinition/P21684_10000_gene/1'
When I just copy the materials without attempting to remap the namespaces, it comes through correctly and still linked:
>>> doc1.copy(target_doc=doc2)
>>> doc1.componentDefinitions[0].identity
Out[39]: 'http://liverpool.ac.uk/ComponentDefinition/P21684_10000_gene/1'
>>> doc1.componentDefinitions[0].sequence
Out[40]: <sbol2.sequence.Sequence at 0x1389e7490>
>>> doc2.componentDefinitions[0].identity
Out[41]: 'http://liverpool.ac.uk/ComponentDefinition/P21684_10000_gene/1'
>>> doc2.componentDefinitions[0].sequence
Out[42]: <sbol2.sequence.Sequence at 0x135e0ce50>
I've filed a bug on pySBOL2 (https://github.com/SynBioDex/pySBOL2/issues/413).
Since I don't think there's a reason to try to remap the namespace to itself, however, copying without namespace remapping can be used, avoiding the bug.
Got a pull request up with something that appears to fix this issue (https://github.com/pablocarb/doebase/pull/12), including allowing the file from your test case to run without raising an exception in LCRGenie. I leave it to you to assess whether the output is the desired output or not, as I have discovered more troubling code fragilities in doebase.
Ok all seems to work now! I've tested doebase
, lcr_genie
and dnaweaver_synbiocad
and no error raised, even on Debian.
I'm currently publishing new releases of these tools and I will have a look on upstream tools to try again to migrate to pySBOL2. I take your modifications and add some major code simplifications and made a new PR.
Thank you very much for your very useful help, we greatly appreciate!
You're welcome! Hopefully the changes that were made on these tools can be a good template for debugging switch-overs in the upstream tools as well.
I am closing this issue as I believe it is now complete. If there are additional problems in the pipeline, please open a new issue and link this one.
@breakthewall Side note: if you'd like these tools listed on the SBOL website, you can fill out the SBOL tools form at: https://docs.google.com/forms/d/e/1FAIpQLScOTJLCoTniVPrMh88eg74Eaubh1bFMjncbyG6yt8q4cFLQ-Q/viewform
@jakebeal Ok I filled it up for several tools.
Hi,
we still use pySBOL with old tools and I noticed a malfunction under Debian and CentOS. And we try to run these commands under a Galaxy instance which is under Debian.
If we install pySBOL via
pip
and run the following commands:the following result is obtained:
While on Ubuntu, the following result (expected) is obtained:
Python 3.7, 3.8 on different computers of different users have been tested.
I know it's outdated by we really need to use these function of pySBOL as tools we used are not under development anymore and not written by us. Thanks!