Closed derrickmehaffy closed 5 years ago
When you query on system. EDSM give you data on all bodies in that system. We should then be able to match on name relatively easily by converting the names to a common format. We can use bodyid for matching in the event of a name change. Eg I had a planet renamed to Garibaldi. Did I mention that before?
Yeah renames are a problem Anthor keeps a file of known "Special Systems" not sure if that includes renames.
We may consider parsing this php file into a common format (or asking Anthor to provide a json file for ease of use) But that could be referenced if needed.
Started work on it but will end up significantly refactoring.
It will look something like this
For each system that needs updating get bodies from edsm for each body if body in database then update else insert new body
I have some doubts about how we are approaching this. My concern is that we will keep hitting EDSM and as our data set grows we will put a heavier load on EDSM.
I think the best way of doing this is to use the EDSM nightly downloads to get only data that has changed in the last 7 days. We would look at this and if it contains any systems that we have in our database then we can update them.
Every system that we store should have at least one body. So if we add a system and it does't have a body then we can fetch the bodies from EDSM at that point.
When we get the Celestial bodies update dump, we can update any existing systems with the latest body data.
If we think we are out of sync we can either hit up EDSM with individual API calls or get a full body dump
Do we really need a body on your good 4k uss systems? :P
I've considered the dump before but we need to make strapi as lightweight as possible, grabbing only the data we need to make it easier and faster to sync and keeping size low to allow us to run multiple instances in many places.
Good point about the USS systems we need a way of excluding systems from body data.
I'm not proposing that we mirror EDSM just that we get EDSM to tell us what has changed. So we could download the bodies update and not actually update anything at all
Here is how we could exclude USS. We maintain list of models for which we would like to have body data. Eg, bmsites tgsites etc. but not USS Sites.
For each of these models we can build up a list of systems to check for body data Next download the bodies update. If any of our systems are in the bodies update then we update them.
We can run this process once per day or every few days if you prefer. Most of the time we wouldn't update any sites at all.
It would work something like this.
models = getModelist
systems = getSystems(models)
r=requests.get(url,stream=True)
For line in iter_lines
j=json.loads(line)
If j[“systemId64”] in systems
updateSystem(j)
Here is a little proof of concept
import requests
import json
url="https://www.edsm.net/dump/bodies7days.json"
r=requests.get(url,stream=True)
for line in r.iter_lines():
if line not in ('[',']'):
try:
if line[-1:] == ",":
d=json.loads(line[:-1])
else:
d=json.loads(line)
if d["systemId64"] in ( 224644818084, 626171727272,828281818282,828282828):
print(json.dumps(d, sort_keys=True, indent=4))
except:
print("ERROR")
print(line)
The only thing we might want to consider is chunking the list of id64s that we are searching. But how much RAM will they take up anyway? Lets say we had 100,000 bodies and each ID64 was 64 bytes allowing for internal python gubbins then that's only 6meg of ram.
So I'm doing two scripts,
This will find systems that have no bodies recorded against them and look them up in EDSM. Id there are a large number of bodies to look up this is potentially quite slow. I performed a load using systems from faction kill and hyperdictions and it was interesting to note that there were two systems that were in EDSM but didn't have primary stars. I flew out to visit them and then re-ran the script and it populated them with the data.
It would be a nice idea to generate a list of systems that have no body and use that as the basis of a patrol so that we can automatically ask people to update EDSM.
A bit of fine tuning to do to the inserts.
This will run once per day and will download a list of all the bodies that changed in the last 7 days. It will stream the file and update or insert the data only when the system matches one of the systems held on our system. The limitation of this script is that it has to be run at least weekly and updates are no more frequent than once daily. We can also set a parameter on this that will allow us to to use the full dump to do a complete refresh if we think we need to.
The is now being handled by the following Node based tool: https://github.com/canonn-science/Canonn-EDSM-Updater
Marking this as closed for now
Tracking Issue for EDSM Python script for pulling body data and caching the data locally as well as scripted for cron updates.
Due to our lack of available javascript devs we should build this in Python for now and move to javascript later.
Breakdown
So most of the other 3rd parties currently store the body name as
systemname bodyname
however we differ in that we break the two apart. This will be the biggest challenge I believe as you cannot just hit the body table then query edsm. You will need to grab the system ID, grab the systemName, and query EDSM using the systemName. Then you will need to join the systemName with the bodyName and search the response for the data. There should also be anelse if
clause in that if you do not find the entry, you should then just search the data for the bodyName. I think in most cases those two should pull the correct data and properly handle special case systems likeSol
where the body name is a custom named object such asMars
.Also as a footnote to the above, it is possible that the body data does not exist, in which case we will need to skip that. I can add a special boolean column if needed to help with the cronscript so we can track how many times it has been skipped (may need to add another argument to say
don't query if skip > x amount
So that the cron script doesn't just keep trying to lookup the same missing body over and over. (see example arguments below)Required arguments
Similar to the systems script, below are some arguments that should be added to allow for ease of use in specific use cases:
development
,staging
, orproduction
--development
database.json
file for MySQL connection dataedsmID
not every columnBreakdown of our columns vs EDSM
(WIP)