alfiehiscox / spqr-api

SQPR-API is an opensource API for querying information about the ancient Romans
MIT License
0 stars 0 forks source link

Rulers Data: Scraping and research #1

Open alfiehiscox opened 1 year ago

alfiehiscox commented 1 year ago

Task

This issue will most likely be a long lived one so as to encourage discussion as to who to and who to not include in the list

The rulers data needs to be scraped and formatted into a CSV file before we can make it available in the API.

This entails gathering the rulers of Rome between the ages of 200 BCE and 200 CE, as far as we know them. By 'rulers' we use the exact definition of 'a person exercising government or dominion' and stress that this will be far from a complete list.

The people on this list should cross reference with the events listed in the events endpoint so there is some kind of continuity.

A Starting Point

In the cases of the Emperors of Rome (27 BCE - 200 CE) this is relatively know although not an exact chronology. It is safe to say that all Emperors should be included in the data.

For Republican Rome things are a little harder. One can't (and won't) try and detail every senate member and politico in the Roman Republic, but it does not mean we cannot collect data about interesting people in this time period. The consuls seem to be a pretty well established way of talking about power in this period, but also politicians of note should be included: think Cicero, Sulla etc.

The data should exclude peoples that (although potentially relevant) are not considered 'rulers' of Rome. Hannibal is of key significance to the period but was never a 'ruler' of Rome. The same could be said for Spartacus for example.

This endpoint is largely talking about people, therefore the following data-points should be given in the least. Again this is just a starting point and is liable to change:

{
  "name": "Gaius Julius Caesar",
  "birth": "100 BCE",
  "death":  "44 BCE",
  "cause-of-death": "assassination",
  "titles": [
    {
      "title": "Dictator Perpetuo",
      "period": "44 BCE",
    }, 
    {
      "title": "Consul",
      "period": "59 BCE"
    }
  ]
}

Again this is just a starting point and is likely to change throughout the various iterations of this endpoint.

alfiehiscox commented 1 year ago

Rather than serializing to CSV like originally thought, is makes more sense to use JSON. There is an inherent tree like structure to the data with rulers having their various titles.

We currently have duplicates for emperors and consuls, Julius Caesar is the notable exception. We only go up to 43BCE for each consul even though they stretch throughout the Empire part of Rome.

alfiehiscox commented 1 year ago

Data! Data is king.

We need a bit more data I think. I currently have all of the Emperors in the timeframe and Consuls but apart from their life and death I think we need more data. Last night I created a little BubbleTea TUI to parse the data coming from Wikipedia. It allows you to select text from the opening paragraph of the Consul's page. Since I get my list of consuls from Wiki there are links to most of them. I do think this will be a partly manual process of data collection and I hope to get a kind of archive format to save it to disk. Currently that format is JSON, but it's getting a little unwieldy so maybe there is another way of formatting the nested data. The TUI will be added to src at some stage as well.