2017-iEvoBio / organization

Logistical details, Suggestions for discussion topics, Agenda
6 stars 4 forks source link

Software Bazaar: The NSB marine microfossil database system: a resource for paleobiology and paleoceanography #19

Closed david-lazarus closed 7 years ago

david-lazarus commented 7 years ago

Software Bazaar: NSB system

Johan Renaudie• David Lazarus•, Patrick Diver+

The one unique difference between fossils and living organisms is their location in geologic time. Despite the many imperfections of the fossil record, this unique attribute is used in paleobiologic studies of evolutionary patterns and processes in ways not possible with living material. The main resource for this for the last decade has been the Paleobiology Database, which covers most groups of organisms over the Phanerozoic (0-600 Ma [million years]). PBDB provides data mostly at genus-level and correlatable between locations at about 10 my resolution. Much higher resolution data (species level, <1 my age resolution) better suited to studies of evolutionary processes are offered by the Cenozoic (0-60 Ma) marine microfossil record of planktonic and benthic protists, but this data has been little used as yet for paleobiologic studies. We describe here NSB (Neptune Sandbox Berlin: www.nsb-mfn-berlin.de) a newly expanded, improved database system that, complementary to PBDB, provides access to the published marine microfossil record for paleobiologic, and paleoceanographic research. NSB consists of ca 1 million data records of species occurrences, evolutionary events, and continuous geochronologic models of deep-sea sediment sections stored in a Postgres database; a Python-Django website for non-technical users running pre-defined queries and analyses; and supporting programs and data standards, particularly the ADP (Age Depth Plot) program (also python) for developing geochronologic age models for sections; the SOD-OFF (stratigraphic occurrence data - open file format) definition for user-friendly recording of primary data and metadata on microfossil occurrences; and an R package linking the fossil occurrence data to paleoceanographic data in external archives, e.g. Pangea. The system has been developed using (very) intermittent funding over 25 years by different groups in several countries and currently is based at the Natural History Museum in Berlin.

k8hertweck commented 7 years ago

Thanks for your submission to the Software Bazaar! As described by our open source statement, do you have the source code for your project available? Thank you!

david-lazarus commented 7 years ago

source code - never specifically discussed with Johan before but I’m sure he’d be happy to share e.g. the website code with anyone who is interested- they just should ask him. He writes the code assuming his own understanding so it might be a bit cryptic to others. The database definition is summarized in our poster but we can also share that. We already give direct read-only access to the database to anyone who wants it - they can thus see and copy the ddl should they be interested. Johan has put some other projects on github so maybe he can do that too with the database code. But at the moment he is in France (defending democracy in the elections there) so we’ll have to take it up with him later, e.g. July.

cheers dave

David Lazarus david.lazarus@mfn-berlin.de Museum für Naturkunde Invalidenstrasse 43 10115 Berlin

daisieh commented 7 years ago

Hi David,

Unfortunately, we can't accept entries into the Software Bazaar that don't adhere to our open source statement, but if you're interested, we'd love to hear about your work in a Lightning Talk instead. We strongly encourage you to encourage Johan to develop and release his code publicly on Github or similar in the future as well!

Thanks, Daisie (on behalf of the organizing committee)

david-lazarus commented 7 years ago

Hi,

Software Bazaar vs Lightning Talk - what precisely is the difference, other than a few minutes reserved talk time? I assume I am allowed to show the database and discuss it freely regardless?

cheers, dave

daisieh commented 7 years ago

There is no interactive demo associated with the lightning talk, you'd just present a short talk about your database. You could do a brief demo as part of the talk, but it'd be hard to squeeze that in in 5 minutes.

david-lazarus commented 7 years ago

Hi Daisy,

Well, there is already a poster on the database at the main meeting, and a re-check of the iEvoBio schedule seems not to offer alternatives to the suggested (IMHO, for our purposes limited) lightning talk. Thus I’ve decided to skip the iEvoBio session.

I had thought the main point of the meeting was simply informal exchange of information at various levels about evolution software and was open to people working in different areas of science, and in different ways. There are in fact - perhaps not in your neck of the woods but elsewhere in science - large bodies of important software that are either, like ours, mostly informally shared, but not posted somewhere; or restricted in some way (most large science organisations SFAIK do not post all the code behind their public websites); up to and including closed commercial software which is often the only way to fund the programmer(s) so the software remains available to the community. There are many valid models that are used to support science software development. Yours is the first time in my 30+ year involvement with science software that pre-posted code has been a requirement for a meeting. Perhaps you should consider in the future being more open to these alternate ways of working - you’ll communicate with more people that way (and be putting the support of software development at the top of the priority list).

Good luck with your meeting tho.

Regards,

Dave Lazarus

k8hertweck commented 7 years ago

Thanks for sharing your concerns. We acknowledge that there are other ways software is developed and supported, but iEvoBio has been committed to open source since its inception, and we'll endeavor to be as clear about them as possible.