Global-Biofoundries-Alliance / DNA-scanner

Online tool for comparing prices and feasibility of DNA synthesis
MIT License
17 stars 7 forks source link

Parse files in SBOL1-format. #24

Closed gled0n closed 4 years ago

gled0n commented 4 years ago

I am currently writing the parser for the three different file formats using the tool Biopython (https://biopython.org/) for Fasta and GenBank and the tool PySbol for SBOL files. My idea of parsing at this moment is the following: parse for every sequence in the input file, its name and the bases itself (ATCG).

PySbol seems to work great for the SBOL2 format but not for the SBOL1. The function Document.read() seems to not be able to read the data of the input file. I have attached a Jupyter-Notebook and the two .xml files (one in SBOL1 and the other in SBOL2) so you can see where I seem to be stuck. Maybe I am doing something wrong? Or maybe you have a better way to do this? Insight would be appreciated. sbol-parsing.zip

jkabisch commented 4 years ago

I feel that we can just stick to SBOL2. @njhillson @eoberortner you guys have more expertise on the topic, what to you think? Can you help out @gled0n?

eoberortner commented 4 years ago

pysbol is SBOL2. I'd also suggest to stick with SBOL2 at the moment. There is, however, an online converter (https://converter.sbolstandard.org/), enabling to convert among different sequence formats and SBOL versions. The converter "should" have an API, but I haven't used it so far.

neilswainston commented 4 years ago

It's my understanding that SBOL v1 is / has already fallen into obsolescence. Don't think that there is any mileage in supporting it, IMHO.