alan-turing-institute / TuringDataStories

TuringDataStories: An open community creating “Data Stories”: A mix of open data, code, narrative 💬, visuals 📊📈 and knowledge 🧠 to help understand the world around us.
Other
40 stars 14 forks source link

[Turing Data Story] #175

Closed joecerniglia closed 2 years ago

joecerniglia commented 2 years ago

Story description

I have two 20th century glass fragments that I submitted to a lab for analysis of their elemental composition. The first fragment is part of the base of a cosmetic ointment pot. It was collected in 2010 from the island of Nikumaroro and is datable to the 1930s. This ointment pot may possibly have belonged to aviation pioneer Amelia Earhart. The second fragment was purchased from eBay as a reference sample to the first.

I have located a 1987 United Kingdom database of glass samples. This database was used to train an early machine learning model to help categorize glass fragments from crime scenes, such as break-ins.

There are two main research questions I wished to answer:

1) What do the correlations between elements for the different types of glass in the 1987 database reveal about late 20th century glassmaking, as compared with early 20th century glassmaking techniques? 2) Using machine learning to train a model on the 1987 database, can that model be used to identify one or both of the older samples unseen by the model as containers?

Which datasets will you be using in this Turing Data Story?

This analysis uses a U.K. database of glass samples from 1987 developed by Ian W. Evett and Ernest J. Spiehler. Their paper, first presented at the 1987 conference of the KBS (Knowledge-Based Systems) in Goverment, is: Evett, Ian W. and Spiehler, E. J., "Rule Induction in Forensic Science." The database is available at the University of California Machine Learning Repository here.

Additional context

I and a group of my colleagues wrote a paper on the jar in 2013. We stated that the jar had an unusual chemistry. We had evidence from books on the art of glassmaking that told us this, but we lacked data to make comparisons for ourselves. Today, three factors have made this comparison possible:

  1. The tools of machine learning having become more widely available and accessible;
  2. Tools such as Jupyter notebook have made reproducible research more widely accepted and desirable.
  3. Database repositories provide freely accessible and downloadable data with which to experiment with 1 and 2 preceding.

Ethical guideline

Ideally a Turing Data Story has these properties and follows the 5 safes framework.

Current status

Updates

joecerniglia commented 2 years ago

Corrected "Using a maching learning to train a model..." to "Using machine learning to train a model..."