eyeonus / Trade-Dangerous

Mozilla Public License 2.0
96 stars 31 forks source link

EDDB will soon cease operations #110

Open bgol opened 1 year ago

bgol commented 1 year ago

In case you didn't notice: https://forums.frontier.co.uk/threads/eddb-a-site-about-systems-stations-commodities-and-trade-routes-in-elite-dangerous.97059/page-37#post-10114765

eyeonus commented 1 year ago

Well that's not helpful.

Meowcat285 commented 1 year ago

EDDB has now shut down, is there any plans to update TD to use something else, like inara for example?

Edit: Looks like inara doesn't seem to have a API for exporting data

eyeonus commented 1 year ago

Working on it.

Tromador commented 1 year ago

Temporarily TD is working, but uses stations and systems from the day EDDB died. That said, the first phase for server work for this change is now functionally complete. We are producing our own listings.csv and continue to produce listings-live.csv as normally. Next thing is dealing with new systems and stations as those need entirely new code to handle so they are imported correctly.

aadler commented 1 year ago

Late to the party, here, but would it be possible to pull from EDSM, Inara, or even Spansh? The first two rely on EDDN, as did EDDB. Perhaps there is a way to hook into their feed.

eyeonus commented 1 year ago

I don't think any of them have a means of obtaining bulk data. I know I looked at this back when the end of EDDB was first announced, and things didn't pan out. IIRC, one of the places I looked at, I think it was Inara, did have an API, but it was for single queries only, as in "give me the data for this station", so that wouldn't work.

I would love to be wrong about this, because figuring out how to do it ourselves sucks.

aadler commented 1 year ago

I'm not an expert in the slightest, so I don't know if this is even feasible, but can @Tromador read and aggregate EDDNs commodities feed for pricing purposes? Start with what we have now and update hourly/daily from an EDDN feed. I'm pretty sure Inara does this. Perhaps EDSM can be approached for system information. Or we can ask @spansh (I believe that's Gareth) if we can download his Data dumps for systems.

What is the major problem, not having an authoritative source for ships, modules, components?

eyeonus commented 1 year ago

I'm not an expert in the slightest, so I don't know if this is even feasible, but can @Tromador read and aggregate EDDNs commodities feed for pricing purposes?...

This is what we have now. Tromador's server runs a python script that does exactly that, that's how listings-live.csv was generated before the end of EDDB, and after the "first phase" as Tromador put it, that is also how the listings.csv file is generated as well.

For details: https://github.com/eyeonus/EDDBlink-listener

What is the major problem, not having an authoritative source for ships, modules, components?

Yes. As far as market data is concerned, we've got that covered. However, we have no means of updating TD with new anything; new commodities, new stations, any of it. (Actually I think I did make it so if new commodities show up they do get added to the DB, but I'm not certain, and I'm too lazy to look right now.)

For some things, like rare items, that's not a big problem because it isn't very often a new ship module gets added to the game, so adding them manually isn't a huge deal, even if it would be nice to have it all done automagically.

Basically, right now we can get all the information contained in an EDDN Commodities message, and process it for inclusion in the DB, but some information TD needs isn't contained in that EDDN message, and so we need to start also processing other EDDN messages in the script so that we can process that stuff too.

For example, if we want to know what star system a station is in, we need to process the docking event of a Journal message.

eyeonus commented 1 year ago

I'm not going to lie, my life is in a bit of an upheaval right now, so I haven't had time to work on this very much.

If anyone who reads this wants to take a crack at it please feel free.

aadler commented 1 year ago

Completely understood; real life comes first, second, and third. We're extremely grateful for the work you (and @Tromador and @bgol and @kfsone ) have done to make our lives both easier and more fun.

EyeMWing commented 6 months ago

Is there any interest in bringing this back? @eyeonus @Tromador in particular.

I've got the most egregious problems with eddblink_listener hammered out and running on my machine, and I'm working on a mechanism to replay the archived eddn streams to load data from between when EDDB went out and now. That should get TD up and going with old (EDDB era) star systems.

After that, I don't think it would be a very big lift to get a star system feed out of the EDDN journals. I haven't actually looked at the guts of TD to see what else it might need.

Tromador commented 6 months ago

Hi,

The last position we had (bearing in mind my memory is not what it once was) was there were some issues causing threads to hang up. The database was still having issues in that it liked to grow larger, though eyeonus had done a lot of work on that area and it was miles better than it once was. There was some testing to be done to try and pin down the cause of some problems, I can't even remember now what, but my health took a downturn, my cat became diabetic, eyeonus was injured in a road accident and as nobody was asking for TD to be mended it really seemed like it was very low on the list of priorities, if indeed it was on the list at all.

Assuming good data goes into the database, I think basically TD should work, I mean why not? It's problem may be it was written at a time when we were still in Beta with a lot less stations and perhaps it hasn't scaled well. It doesn't seem to matter how much in the way of memory or cpu it gets, or the speed of the storage, some queries are disgustingly slow. That said, I still maintain my belief that TD's userside query syntax is by far and away streaks ahead in the questions it can answer.

I am more than willing to host a working version of the software with associated datasets for download. What I may not have the energy for is a lot of convoluted testing if wierd and wonderful bugs crop up.

Cheers

Trom

On Thu, 28 Dec 2023, 04:57 EyeMWing, @.***> wrote:

Is there any interest in bringing this back? @eyeonus https://github.com/eyeonus @Tromador https://github.com/Tromador in particular.

I've got the most egregious problems with eddblink_listener hammered out and running on my machine, and I'm working on a mechanism to replay the archived eddn streams to load data from between when EDDB went out and now. That should get TD up and going with old (EDDB era) star systems.

After that, I don't think it would be a very big lift to get a star system feed out of the EDDB journals. I haven't actually looked at the guts of TD to see what else it might need.

— Reply to this email directly, view it on GitHub https://github.com/eyeonus/Trade-Dangerous/issues/110#issuecomment-1870822360, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJJGYLEV3SIMMRST3OTTVDLYLT33PAVCNFSM6AAAAAAWQEPTGGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNZQHAZDEMZWGA . You are receiving this because you were mentioned.Message ID: @.***>

eyeonus commented 6 months ago

You should absolutely feel encouraged to submit a PR to either/both repositories, I would love to have some help with this stuff.

That said, to the best of my knowledge Tromador's server is still running, and I long ago patched the listener to work without EDDB, so assuming that's true, TD is still up to date, at least regarding market data for the systems that existed when EDDB went down.

I look forward to seeing your fixes.

On Wed, Dec 27, 2023, 21:57 EyeMWing @.***> wrote:

Is there any interest in bringing this back? @eyeonus https://github.com/eyeonus @Tromador https://github.com/Tromador in particular.

I've got the most egregious problems with eddblink_listener hammered out and running on my machine, and I'm working on a mechanism to replay the archived eddn streams to load data from between when EDDB went out and now. That should get TD up and going with old (EDDB era) star systems.

After that, I don't think it would be a very big lift to get a star system feed out of the EDDB journals. I haven't actually looked at the guts of TD to see what else it might need.

— Reply to this email directly, view it on GitHub https://github.com/eyeonus/Trade-Dangerous/issues/110#issuecomment-1870822360, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANHHYCHRTNCO724NU477A3YLT33PAVCNFSM6AAAAAAWQEPTGGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNZQHAZDEMZWGA . You are receiving this because you were mentioned.Message ID: @.***>

aadler commented 5 months ago

I note that @spansh (https://github.com/spansh), of neutron plotter fame, now has market data. Here is an example. He also has system data dumps. Should we reach out to him to see if there is anything TradeDangerous can leverage?

eyeonus commented 5 months ago

We could potentially use the dumps https://spansh.co.uk/dumps I haven't looked at them yet, but nightly dumps is what we used from EDDB, so....

Also what ever happened to @EyeMWing? I expected to see a PR at some point from that one.

rmb4253 commented 5 months ago

I don't have the knowhow to help in this in any way but I'm so pleased that TD has not been completely forgotten. I could probably help with testing as a user though.

Clivuus commented 5 months ago

I have been trying to update TDHelper every time I play Elite Dangerous Odyssey, but it seems to have stopped updating about 7 months ago. Hopefully something will soon happen. I would also be happy to help with testing of a new and improved version.

eyeonus commented 5 months ago

TDHelper is run by somebody else, it's not something I have anything to do with.

spansh commented 5 months ago

I'm more than happy to help populate data. We have the new system dumps at https://spansh.co.uk/dumps which are purely system data (no bodies). However if you also want station data you can grab the full galaxy file though that's probably a little large for players to download.

If you only care about station data you can get galaxy_stations.json.gz. That contains all known data about the every system which contains a station including player fleet carriers, I'm. parsing that for my new trade router and it takes 3-5 minutes to parse using RapidJSON. I'm not as familiar with python but there are fast JSON parsers available and if you're worried about memory usage and don't have access to a SAX/Streaming parser I've made some concessions to make it relatively easy to create a streaming parser for those files manually.

If you'd like more help with this you can catch me on the EDCD Discord.

On Mon, 22 May 2023 at 04:18, Avraham Adler @.***> wrote:

I'm not an expert in the slightest, so I don't know if this is even feasible, but can @Tromador https://github.com/Tromador read and aggregate EDDNs commodities feed for pricing purposes? Start with what we have now and update hourly/daily from an EDDN feed. I'm pretty sure Inara does this. Perhaps EDSM can be approached for system information. Or we can ask @spansh https://github.com/spansh (I believe that's Gareth) if we can download his Data dumps for systems.

What is the major problem, not having an authoritative source for ships, modules, components?

— Reply to this email directly, view it on GitHub https://github.com/eyeonus/Trade-Dangerous/issues/110#issuecomment-1556462514, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAZGKCN72PJYY25TEOVAH3XHLLJPANCNFSM6AAAAAAWQEPTGE . You are receiving this because you were mentioned.Message ID: @.***>

Tromador commented 5 months ago

I'm more than happy to help populate data. We have the new system dumps at https://spansh.co.uk/dumps which are purely system data (no bodies). However if you also want station data you can grab the full galaxy file though that's probably a little large for players to download.

Thanks for the offer of support. Big files don't really scare me. Potentially we can have the server grab it and output something smaller for clients. I always used to have the server configured to grab and hold more data than the average user would download, at least by default (they could still grab it via options if they really wanted it).

I too was hoping for this PR from @EyeMWing. That said, with @spansh willing to help with a reliable data source, I am willing to run up the ED server on current code, on the assumption we can start looking again at some of the long standing issues - I mean it does work, but there were some niggles.

Assuming we do that, I would ask for patience (especially from @eyeonus 🙂) it's been a very long time since I looked at this and the brain fog from my illness and associated meds will likely have me going over old ground previously discussed as though it never happened. I know this can be a little frustrating at times, it certainly annoys me when I know my cranium isn't firing on all cylinders.

EyeMWing commented 5 months ago

I’m still here, just got pulled away from ED for a little bit by some priority work. I was actually right in the middle of trying to solution new star systems - looks like we’ve got solution for that now. I’ve got some time this evening, will pull down the dump and see about getting it parsed. Shouldn’t be too bad.Sent from my iPhoneOn Jan 31, 2024, at 6:02 AM, Stefan Morrell @.***> wrote:

I'm more than happy to help populate data. We have the new system dumps at https://spansh.co.uk/dumps which are purely system data (no bodies). However if you also want station data you can grab the full galaxy file though that's probably a little large for players to download.

Thanks for the offer of support. Big files don't really scare me. Potentially we can have the server grab it and output something smaller for clients. I always used to have the server configured to grab and hold more data than the average user would download, at least by default (they could still grab it via options if they really wanted it). I too was hoping for this PR from @EyeMWing. That said, with @spansh willing to help with a reliable data source, I am willing to run up the ED server on current code, on the assumption we can start looking again at some of the long standing issues - I mean it does work, but there were some niggles. Assuming we do that, I would ask for patience (especially from @eyeonus 🙂) it's been a very long time since I looked at this and the brain fog from my illness and associated meds will likely have me going over old ground previously discussed as though it never happened. I know this can be a little frustrating at times, it certainly annoys me when I know my cranium isn't firing on all cylinders.

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>

Tromador commented 3 months ago

@EyeMWing You posted over a month ago that you had some time "this evening". Please can you have a think and honestly decide if you have the time and inclination to do this work. If you don't, that's fine, everything here is voluntary. We'll decide if/how we want to proceed without you and that's ok. Conversely if still intend to put in your promised PR, please can you do so? It's not really fair to tell us you have solutions for these problems, you said in December that you have code running and working on your system, but never sent the PR. Perhaps if you've lost interest, you might simply send what you have so we can look it over and use it?

lanzz commented 3 months ago

I had a bit of free time today, so I put together a quick parser for @spansh's dump files. I did a (very cursory) research into fast JSON parsers and settled on csymdjson. It can ingest the 8.8GB (uncompressed size) galaxy_stations.json in about 23 seconds on my M1 Pro Macbook (without doing anything with the data, that's just load time). It does process the input line by line to avoid needing insane amounts of memory, which means it makes some assumptions about the format of the galaxy dumps, namely that each system is on a single line, and the first and last lines of the JSON are the opening and closing square brackets.

Here it is as a proof of concept:

import cysimdjson
from collections import namedtuple

DEFAULT_INPUT = 'galaxy_stations.json'

Commodity = namedtuple('Commodity', 'name,sell,buy,demand,supply,ts')

def ingest(filename):
    parser = cysimdjson.JSONParser()
    with open(filename, 'r') as f:
        f.readline()    # skip over inital open bracket
        for line in f:
            line = line.rstrip().rstrip(',')
            if line == ']':
                # end of dump
                break
            system_data = parser.loads(line)
            yield from _ingest_system_data(system_data)

def _ingest_system_data(system_data):
    for station_name, update_time, commodities in _find_markets_in_system(system_data):
        yield f'{system_data["name"]}/{station_name}', _ingest_commodities(commodities, update_time)

def _ingest_commodities(commodities, update_time):
    for category, category_commodities in commodities.items():
        yield category, _ingest_category_commodities(category_commodities, update_time)

def _ingest_category_commodities(commodities, update_time):
    for commodity, market_data in commodities.items():
        yield Commodity(
            name=commodity,
            sell=market_data["sellPrice"],
            buy=market_data["buyPrice"],
            demand=market_data["demand"],
            supply=market_data["supply"],
            ts=update_time,
        )

def _find_markets_in_system(system_data):
    for station in system_data['stations']:
        if 'Market' not in station.get('services', []):
            continue
        if not station.get('market', {}).get('commodities', []):
            continue
        yield (
            station['name'],
            station['market'].get('updateTime', None),
            _categorize_commodities(station['market']['commodities'], ),
        )

def _categorize_commodities(commodities):
    commodities_by_category = {}
    for commodity in commodities:
        commodities_by_category.setdefault(commodity['category'], {})[commodity['name']] = commodity
    return commodities_by_category

if __name__ == '__main__':
    print('#     {name:35s}  {sell:>7s}  {buy:>7s}  {demand:>10s}  {supply:>10s}  {ts}'.format(
        name='Item Name',
        sell='SellCr',
        buy='BuyCr',
        demand='Demand',
        supply='Supply',
        ts='Timestamp',
    ))
    print()
    for station_name, market in ingest(DEFAULT_INPUT):
        print(f'@ {station_name}')
        for category, commodities in market:
            print(f'   + {category}')
            for commodity in commodities:
                print('      {name:35s}  {sell:7d}  {buy:7d}  {demand:10d}  {supply:10d}  {ts}'.format(
                    name=commodity.name,
                    sell=commodity.sell,
                    buy=commodity.buy,
                    demand=commodity.demand,
                    supply=commodity.supply,
                    ts=commodity.ts,
                ))
        print()

That POC prints out the result in Trade Dangerou's prices format, but it is intended to be used to provide the data in a programmatically convenient way, so it doesn't necessarily need to pass through a conversion step, Trade Dangerous could potentially just load the prices directly from the galaxy dumps.

spansh commented 3 months ago

You can trust that assumption about the file format, I specifically formatted it that way so that people without streaming JSON parsers can roll their own easily. One option for parsing would be pysimdjson which is a hook into purportedly the fastest JSON parser there is currently.

On Thu, 14 Mar 2024 at 18:34, Mihail Milushev @.***> wrote:

I had a bit of free time today, so I put together a quick parser for @spansh https://github.com/spansh's dump files. I did a (very cursory) research into fast JSON parsers and settled on csymdjson https://pypi.org/project/cysimdjson/. It can ingest the 8.8GB galaxy_stations.json.gz in about 23 seconds on my M1 Pro Macbook (without doing anything with the data, that's just load time). It does process the input line by line to avoid needing insane amounts of memory, which means it makes some assumptions about the format of the galaxy dumps, namely that each system is on a single line, and the first and last lines of the JSON are the opening and closing square brackets.

Here it is as a proof of concept:

import cysimdjsonfrom collections import namedtuple DEFAULT_INPUT = 'galaxy_stations.json' Commodity = namedtuple('Commodity', 'name,sell,buy,demand,supply,ts') def ingest(filename): parser = cysimdjson.JSONParser() with open(filename, 'r') as f: f.readline() # skip over inital open bracket for line in f: line = line.rstrip().rstrip(',') if line == ']':

end of dump

            break
        system_data = parser.loads(line)
        yield from _ingest_system_data(system_data)

def _ingest_system_data(system_data): for station_name, update_time, commodities in _find_markets_in_system(system_data): yield f'{system_data["name"]}/{station_name}', _ingest_commodities(commodities, update_time) def _ingest_commodities(commodities, update_time): for category, category_commodities in commodities.items(): yield category, _ingest_category_commodities(category_commodities, update_time) def _ingest_category_commodities(commodities, update_time): for commodity, market_data in commodities.items(): yield Commodity( name=commodity, sell=market_data["sellPrice"], buy=market_data["buyPrice"], demand=market_data["demand"], supply=market_data["supply"], ts=update_time, ) def _find_markets_in_system(system_data): for station in system_data['stations']: if 'Market' not in station.get('services', []): continue if not station.get('market', {}).get('commodities', []): continue yield ( station['name'], station['market'].get('updateTime', None), _categorize_commodities(station['market']['commodities'], ), ) def _categorize_commodities(commodities): commodities_by_category = {} for commodity in commodities: commodities_by_category.setdefault(commodity['category'], {})[commodity['name']] = commodity return commodities_by_category if name == 'main': print('# {name:35s} {sell:>7s} {buy:>7s} {demand:>10s} {supply:>10s} {ts}'.format( name='Item Name', sell='SellCr', buy='BuyCr', demand='Demand', supply='Supply', ts='Timestamp', )) print() for station_name, market in ingest(DEFAULT_INPUT): print(f'@ {station_name}') for category, commodities in market: print(f' + {category}') for commodity in commodities: print(' {name:35s} {sell:7d} {buy:7d} {demand:10d} {supply:10d} {ts}'.format( name=commodity.name, sell=commodity.sell, buy=commodity.buy, demand=commodity.demand, supply=commodity.supply, ts=commodity.ts, )) print()

That POC prints out the result in Trade Dangerou's prices format, but it is intended to be used to provide the data in a programmatically convenient way, so it doesn't necessarily need to pass through a conversion step, Trade Dangerous could potentially just load the prices directly from the galaxy dumps.

— Reply to this email directly, view it on GitHub https://github.com/eyeonus/Trade-Dangerous/issues/110#issuecomment-1998084823, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAZGKCQGHCS7BNECGAEHP3YYHUSNAVCNFSM6AAAAAAWQEPTGGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOJYGA4DIOBSGM . You are receiving this because you were mentioned.Message ID: @.***>

lanzz commented 3 months ago

cysimdjson that I went with is also supposed to be wrapping the same underlying JSON implementation (simdjson), but I'll benchmark pysimdjson tomorrow.

lanzz commented 3 months ago

I've fixed a bug, it wasn't picking up surface stations, so now ingestion times have jumped to the 50-70 second range. Here's the latest iteration, supporting both cysimdjson and pysimdjson:

import cysimdjson
import simdjson
import time
from collections import namedtuple

DEFAULT_INPUT = 'galaxy_stations.json'
DEFAULT_PARSER = cysimdjson.JSONParser().loads
ALT_PARSER = lambda line: simdjson.Parser().parse(line)

Commodity = namedtuple('Commodity', 'name,sell,buy,demand,supply,ts')

def ingest(filename, parser):
    """Ingest a spansh-style galaxy dump and emits a generator cascade yielding the market data."""
    with open(filename, 'r') as f:
        f.readline()    # skip over inital open bracket
        for line in f:
            line = line.rstrip().rstrip(',')
            if line == ']':
                # end of dump
                break
            system_data = parser(line)
            yield from _ingest_system_data(system_data)

def _ingest_system_data(system_data):
    for station_name, update_time, commodities in _find_markets_in_system(system_data):
        yield f'{system_data["name"].upper()}/{station_name}', _ingest_commodities(commodities, update_time)

def _ingest_commodities(commodities, update_time):
    for category, category_commodities in commodities.items():
        yield category, _ingest_category_commodities(category_commodities, update_time)

def _ingest_category_commodities(commodities, update_time):
    for commodity, market_data in commodities.items():
        yield Commodity(
            name=commodity,
            sell=market_data["sellPrice"],
            buy=market_data["buyPrice"],
            demand=market_data["demand"],
            supply=market_data["supply"],
            ts=update_time,
        )

def _find_markets_in_system(system_data):
    # look for stations in the system and on all bodies
    targets = [system_data, *system_data.get('bodies', [])]
    for target in targets:
        for station in target['stations']:
            if 'Market' not in station.get('services', []):
                continue
            if not station.get('market', {}).get('commodities', []):
                continue
            yield (
                station['name'],
                station['market'].get('updateTime', None),
                _categorize_commodities(station['market']['commodities'], ),
            )

def _categorize_commodities(commodities):
    commodities_by_category = {}
    for commodity in commodities:
        commodities_by_category.setdefault(commodity['category'], {})[commodity['name']] = commodity
    return commodities_by_category

def benchmark(filename, parser, parser_name=None, iterations=5):
    """Benchmark a JSON parser.

    Prints timing for consuming the entire stream, without doing anything with the data.
    """
    times = []
    for _ in range(iterations):
        start_ts = time.perf_counter()
        stream = ingest(filename, parser)
        for _, market in stream:
            for _, commodities in market:
                for _ in commodities:
                    pass
        end_ts = time.perf_counter()
        elapsed = end_ts - start_ts
        times.append(elapsed)
    min_time = min(times)
    avg_time = sum(times) / len(times)
    max_time = max(times)
    if parser_name is None:
        parser_name = repr(parser)
    print(f'{min_time:6.2f} {avg_time:6.2f} {max_time:6.2f}  {parser_name}')

def benchmark_parsers(filename=DEFAULT_INPUT, **parsers):
    """Benchmark all parsers passed in as keyword arguments."""
    for name, parser in parsers.items():
        benchmark(filename, parser, parser_name=name)

def convert(filename, parser=DEFAULT_PARSER):
    """Converts spansh-style galaxy dump into TradeDangerous-style prices."""
    print('#     {name:35s}  {sell:>7s}  {buy:>7s}  {demand:>10s}  {supply:>10s}  {ts}'.format(
        name='Item Name',
        sell='SellCr',
        buy='BuyCr',
        demand='Demand',
        supply='Supply',
        ts='Timestamp',
    ))
    print()
    for station_name, market in ingest(DEFAULT_INPUT, parser=parser):
        print(f'@ {station_name}')
        for category, commodities in market:
            print(f'   + {category}')
            for commodity in commodities:
                pass
                print('      {name:35s}  {sell:7d}  {buy:7d}  {demand:10d}  {supply:10d}  {ts}'.format(
                    name=commodity.name,
                    sell=commodity.sell,
                    buy=commodity.buy,
                    demand=commodity.demand,
                    supply=commodity.supply,
                    ts=commodity.ts,
                ))
        print()

if __name__ == '__main__':
    benchmark_parsers(
        cysimdjson=DEFAULT_PARSER,
        pysimdjson=ALT_PARSER,
    )

I've benchmarked them and pysimdjson seems to be noticeably faster:

# min / avg / max time
 67.54  67.71  67.81  cysimdjson
 49.94  50.86  51.97  pysimdjson
eyeonus commented 3 months ago

Very nice. Do me a flavour and submit a PR for this, formatted as an import plugin.

lanzz commented 3 months ago

Yeah, that's WIP, I was just focusing on getting the parsing logic right first 👍

bgol commented 3 months ago

You don't need the category in the price file (saves some bytes), see: https://github.com/eyeonus/Trade-Dangerous/blob/master/tradedangerous/cache.py#L583-L586

Tromador commented 3 months ago

@lanzz Probably a stupid question but I rather ask and not need to: I presume that carriers count as "stations in the system"?

lanzz commented 3 months ago

Yes, @spansh dumps list them as stations. I still need to figure out a mapping from spansh station types (which are strings like "Settlement" or "Drake-Class Carrier" or "Coriolis Starport") and TradeDangerous types (which are integers and don't seem to be documented much) in order to keep them clearly distinguishable.

lanzz commented 3 months ago

@spansh Where do these dumps come from? I'm calling them "spansh dumps" everywhere, but no idea how accurate that is or if I'm failing to attribute a more authoritative source

eyeonus commented 3 months ago

Yes, @spansh dumps list them as stations. I still need to figure out a mapping from spansh station types (which are strings like "Settlement" or "Drake-Class Carrier" or "Coriolis Starport") and TradeDangerous types (which are integers and don't seem to be documented much).

TD doesn't really use the type_id for much. Feel free to expand the list if desired. https://github.com/eyeonus/Trade-Dangerous/blob/6f266ac98b7370f7d7e4322ad65fe55ea2c7fc50/tradedangerous/tradedb.py#L1203-L1214

https://github.com/eyeonus/Trade-Dangerous/blob/6f266ac98b7370f7d7e4322ad65fe55ea2c7fc50/tradedangerous/tradedb.py#L1288C1-L1292C25

        type_id = 0
        if fleet == 'Y':
            type_id = 24
        if odyssey == 'Y':
            type_id = 25
lanzz commented 3 months ago

@eyeonus Yeah, I pretty much figured out that much, and the fleet type is more or less clear (there's only one type of fleet carrier, at least at the moment), but I'm not sure what qualifies as an "Odyssey settlement"

spansh commented 3 months ago

https://spansh.co.uk/dumps they get built every night, so the attribution is probably right

On Sat, 16 Mar 2024, 22:33 Mihail Milushev, @.***> wrote:

@spansh https://github.com/spansh Where do these dumps come from? I'm calling them "spansh dumps" everywhere, but no idea how accurate that is or if I'm failing to attribute a more authoritative source

— Reply to this email directly, view it on GitHub https://github.com/eyeonus/Trade-Dangerous/issues/110#issuecomment-2002162923, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAZGKDUKVNYAIARLYOHNX3YYTCCNAVCNFSM6AAAAAAWQEPTGGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMBSGE3DEOJSGM . You are receiving this because you were mentioned.Message ID: @.***>

eyeonus commented 3 months ago

@eyeonus Yeah, I pretty much figured out that much, and the fleet type is more or less clear (there's only one type of fleet carrier, at least at the moment), but I'm not sure what qualifies as an "Odyssey settlement"

Ones you can only access if you have the Odyssey DLC.

lanzz commented 3 months ago

@eyeonus I understand the semantics, I don't know how to determine that from the spansh data 😄

eyeonus commented 3 months ago

All the ones with station type of Settlement https://spansh.co.uk/stations/search/326378B6-E3EE-11EE-8BB4-ECCDBF0F5377/1

lanzz commented 3 months ago

Here's the PR: https://github.com/eyeonus/Trade-Dangerous/pull/117 The plugin supports downloading the data directly from https://spansh.co.uk/dumps, as well as ingesting from a locally-downloaded file. Importing from local file takes about 7min on my machine, downloading from remote takes 53 minutes on my shitty broadband. That's only to the point where the TradeDangerous.prices file is updated, the built-in import command takes extra time on top of that to load the prices. I'd say the JSON performance is moot, network overhead will always dwarf JSON parsing time.

eyeonus commented 3 months ago

Merged

aadler commented 3 months ago

@lanzz may we trouble you to write some usage notes as to how to use the plugin with TradeDangerous, please? Something akin to what is here: https://github.com/eyeonus/Trade-Dangerous/wiki/Plugin-Options?

lanzz commented 3 months ago

Spansh

The Spansh plugin imports market data from https://spansh.co.uk/dumps/galaxy_stations.json.gz.

Basic Usage

# download the data from the default location:
$ trade import -P spansh

# download the data from an alternate location:
$ trade import -P spansh -O url=https://example.com/alternate/source/galaxy_stations.json

# import the data from a previously downloaded file:
$ trade import -P spansh -O file=previously/downloaded/galaxy_stations.json

# increase verbosity (use `-vv` for even more details):
$ trade import -P spansh -v
eyeonus commented 3 months ago

I removed the bit about verbosity, because that's a common option for all TD commands, but otherwise it's been copied verbatim to https://github.com/eyeonus/Trade-Dangerous/wiki/Plugin-Options

lanzz commented 3 months ago

Embarrasing, but I already have a bugfix for my work 😅 https://github.com/eyeonus/Trade-Dangerous/pull/118

eyeonus commented 3 months ago

I get an email alert for all pull requests, you don't need to also comment here. Just fyi, no slight intended.

Tromador commented 3 months ago

This is a big step in the right direction, but there's more to do, inside TD and the TD listener (https://github.com/eyeonus/EDDBlink-listener/issues/25)

We have listener picking up data from EDDN ZeroMQ feed, and calling TD to produce regular pricing data through the day, which can be picked up from my server via the EDDBLink plugin. This used to back off to EDDB if my server was down for any reason, but that won't work any more. We could still pick up system data from Spansh, but they don't keep any pricing.

Which means that (I think, but my brain is rarely firing well these days) we need the listener to pick up data from Spansh and EDDN and the clients to pick up from the Server and there's no alternate backup. Subject of which, the server won't have any kind of meaningful source of initial price data (unless someone knows of a source), so we'll be reliant on the ZeroMQ to populate that over the first hours or days of running.

Finally the Server was unhappy in nebulous ways that we were looking at trying to pin down when EDDB died. Without any actual testing data to support my hypothesis, my gu (at this point my PC threw the first STOP error since I bought it months ago, but somehow the above didn't get sacrificed to the $DEITY of bluescreen, phew!) my gut feeling is that the database was big and fat and unhappy. I don't know how well it scales, bearing in mind a lot of this was originally written by Oliver maybe as far back as closed beta, I really don't remember when he did it but it was all based on importing from .prices files, created in various ways (EDMC still supports this, last I checked with them) and not from live sources or other people's overnight data dumps and in a much smaller universe, smaller bubble, no ground stations, no whole another bubble and so on. Generally much less data.

It's possible that getting data in a fundamentally new way will alleviate this. We have patched in multiple systems to cope with picking up data from EDDB, there's a whole set of stuff for sanitising and sanity-ising data which is probably redundant now. Possibly just getting system data and prices from EDDN will make the whole thing work better as it's simpler, though perhaps a bit less reliable without the fallback.

Anyway, bottom line is that the rest of the code needs adapting to the new workflow. EDDBLink is so ingrained into the way TD works that some amount of unpicking is required to remove it and make it so TD can work with the available data sources.

spansh commented 3 months ago

The galaxy dumps contain full, up to date prices for every station which we have data for. I'm not sure why you keep saying that my site doesn't provide prices.

On Mon, 18 Mar 2024 at 13:34, Stefan Morrell @.***> wrote:

This is a big step in the right direction, but there's more to do, inside TD and the TD listener (eyeonus/EDDBlink-listener#25 https://github.com/eyeonus/EDDBlink-listener/issues/25)

We have listener picking up data from EDDN ZeroMQ feed, and producing regular pricing data through the day, which can be picked up via the EDDBLink plugin. This used to back off to EDDB if my server was down for any reason, but that won't work any more. We could still pick up system data from Spansh, but they don't keep any pricing.

Which means that (I think, but my brain is rarely firing well these days) we need the listener to pick up data from Spansh and EDDN and the clients to pick up from the Server and there's no alternate backup. Subject of which, the server won't have any kind of meaningful source of initial price data (unless someone knows of a source), so we'll be reliant on the ZeroMQ to populate that over the first hours or days of running.

Finally the Server was unhappy in nebulous ways that we were looking at trying to pin down when EDDB died. Without any actual testing data to support my hypothesis, my gu (at this point my PC threw the first STOP error since I bought it months ago, but somehow the above didn't get sacrificed to the $DEITY of bluescreen, phew!) my gut feeling is that the database was big and fat and unhappy. I don't know how well it scales, bearing in mind a lot of this was originally written by Oliver maybe as far back as closed beta, I really don't remember when he did it but it was all based on importing from .prices files, created in various ways (EDMC still supports this, last I checked with them) and not from live sources or other people's overnight data dumps and in a much smaller universe, smaller bubble, no ground stations, no whole another bubble and so on. Generally much less data.

It's possible that getting data in a fundamentally new way will alleviate this. We have patched in multiple systems to cope with picking up data from EDDB, there's a whole set of stuff for sanitising and sanity-ising data which is probably redundant now. Possibly just getting system data and prices from EDDN will make the whole thing work better as it's simpler, though perhaps a bit less reliable without the fallback.

Anyway, bottom line is that the rest of the code needs adapting to the new workflow. EDDBLink is so ingrained into the way TD works that some amount of unpicking is required to remove it and make it so TD can work with the available data sources.

— Reply to this email directly, view it on GitHub https://github.com/eyeonus/Trade-Dangerous/issues/110#issuecomment-2003922319, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAZGKDQBQ4OIHWAEIYRNTTYY3UPZAVCNFSM6AAAAAAWQEPTGGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMBTHEZDEMZRHE . You are receiving this because you were mentioned.Message ID: @.***>

Tromador commented 3 months ago

@spansh Because I'm wrong, apparently. That is actually really helpful to know.

Tromador commented 3 months ago

I should add that I have been suffering from sarcoidosis and chronic pain for years. That causes brain fog without the litany of drugs I have to take, so I'm often wrong and/or forgetful. Not looking for sympathy here, just understanding :).

(and pre-empting Jonathan before he gives me a hard time about the stuff I mentioned which he already addressed last April!)

eyeonus commented 2 months ago

@spansh Would you be willing to add a 1 day version of the galaxy_stations.json.gz? I'm testing the listener, and the spansh import plugin is taking ~3 hours to do an update.

I don't think it's a good idea to have the TD server down for multiple hours while it's updating.