OpenSourceBrain / SBMLShowcase

OSB showcase of interactions between SBML and NeuroML
http://www.opensourcebrain.org/projects/sbmlshowcase
1 stars 2 forks source link

check file compatibility #77

Open stellaprins opened 3 weeks ago

stellaprins commented 3 weeks ago

Filed in this pyNeuroML issue and fixed in this PR.

When I made the BIOMD0000000001 table (see issue #57) I noticed that the SBML and SED-ML files are not recognized as such with the current check_file_compatibility_test function.

(?) Improve compatibility processing to recognize SBML / SED-ML files in ways other than the file names or extensions. (see comments on issue #31)

(?) Figure out how to recognise other formats (e.g. NeuroML, LEMS, SBML-qual, BNGL, RBApy etc)

Issue will be addressed here: https://github.com/NeuroML/pyNeuroML/issues/435

sanjayankur31 commented 3 weeks ago

Hrm, they really shouldn't be using xml as the file extension BIOMD0000000001_url.xml -> that's just too vague.

The only "correct" way of checking files is to actually read them and parse them to see if they include the required XML tags. In this case, we'll have to read the file to see that the "root" tag is <sbml>. We were hoping to avoid this because reading each file to detect its type can be quite resource intensive if the files are large (NeuroML cell files can be tens of thousands of lines). I guess we could come up with some way in which we only read the first and last N lines of the file to check for a tag :thinking. This won't be perfect either because it'll fail for badly formatted files that have comments/garbage in these lines..

Edit: perhaps we try the file extension first, and if that doesn't work because it's "xml", we read the first/last N files to match for tags.

sanjayankur31 commented 3 weeks ago

Filed: https://github.com/NeuroML/pyNeuroML/issues/435

Will try to fix it in the next couple of days

stellaprins commented 3 weeks ago

I believe that in the SBML test_suite (sementic) all SBML and SED-ML files have the .xml file extension. As work around I use the extensions and filenames to determine whether a given file is an SBML file (.xml file extension + sbml but not sedml in filename) or a SED-ML file (.xml file extension + sbml and sedml in filename).

I had a quick look at Biomodels and there SBML files appear to have.xml file extensions and SED-ML files appear to have .sedml extensions.