Closed gled0n closed 4 years ago
I feel that we can just stick to SBOL2. @njhillson @eoberortner you guys have more expertise on the topic, what to you think? Can you help out @gled0n?
pysbol is SBOL2. I'd also suggest to stick with SBOL2 at the moment. There is, however, an online converter (https://converter.sbolstandard.org/), enabling to convert among different sequence formats and SBOL versions. The converter "should" have an API, but I haven't used it so far.
It's my understanding that SBOL v1 is / has already fallen into obsolescence. Don't think that there is any mileage in supporting it, IMHO.
I am currently writing the parser for the three different file formats using the tool Biopython (https://biopython.org/) for Fasta and GenBank and the tool PySbol for SBOL files. My idea of parsing at this moment is the following: parse for every sequence in the input file, its name and the bases itself (ATCG).
PySbol seems to work great for the SBOL2 format but not for the SBOL1. The function Document.read() seems to not be able to read the data of the input file. I have attached a Jupyter-Notebook and the two .xml files (one in SBOL1 and the other in SBOL2) so you can see where I seem to be stuck. Maybe I am doing something wrong? Or maybe you have a better way to do this? Insight would be appreciated. sbol-parsing.zip