eyeonus / Trade-Dangerous

Mozilla Public License 2.0
96 stars 31 forks source link

Cleanup Pass #124

Closed kfsone closed 2 months ago

kfsone commented 2 months ago

I'm trying to revisit various decisions I'd made during the early days that now cause problems either in performance with IDEs or get linters really bent out of shape.

There's also some work here to try and beautify things - although I'm introducing those sparingly and doing them in-place in the plugin where they seem relevant.

Some small amount of performance tuning, but it's not going to really make a noticeable difference yet.

Strongly recommend you review the individual changes; I think trying to read the entire diff would be brain-hurting...

I'm creating a pull-request before I've had chance to finish watching a complete import, mostly because of the size and to give Eyeonus a chance to ask others to review and/or reject/request changes.

eyeonus commented 2 months ago

Everything looks good code-wise, I just have to ask:

Why are some of your commits using the angular prefixes and some aren't?

eyeonus commented 2 months ago

Some tests are failing in tox: tests/test_trade.py:76: AssertionError =========================== short test summary info ============================ FAILED tests/test_cache.py::TestCache::test_parseSupply - assert 2 == 3 FAILED tests/test_trade.py::TestTrade::test_buy - assert 'Cost Units Dis... FAILED tests/test_trade.py::TestTrade::test_nav - AssertionError: assert 'Sys... FAILED tests/test_trade.py::TestTrade::test_market - AssertionError: assert F... =================== 4 failed, 51 passed, 4 skipped in 1.46s ====================

kfsone commented 2 months ago

Everything looks good code-wise, I just have to ask:

Why are some of your commits using the angular prefixes and some aren't?

Not familiar with the term?

Also I have another commit to add: I'd added a UNIQUE(fdev_id) to StationItem and dumpPrices did not like that.

Saw your fix on the supply levels, good catch; also saw you had to fix my whitespace - I'm trying to turn off the settings I have locally to strip trailing whitespace. LMK if I'm still doing it.

kfsone commented 2 months ago

Do you have contributor docs for setting up tox / etc?

kfsone commented 2 months ago

@eyeonus BTW - What IDE/tooling are you using? (I mostly use VSCode, but when I'm trying to be good about python I use PyCharm, and I'm just setting that up for td now)

kfsone commented 2 months ago

Looking at the errors; it looks like we weren't ever actually running flake8 without running all the tests. Once flake8 is enabled ... it warns. A lot -- see incoming extra change to this branch.

kfsone commented 2 months ago

One thing a lot of python linters try to enforce is the distinction between

if happy:
  life = good
# and
if happy(
  life=good
)

so the standard formatters/styles expect keyword assignments to have no space around the equals, and include that in default arguments:

def poop(stink=bad):

unless there's a type annotation:

def poop(stink: Level = bad)
kfsone commented 2 months ago

I'm happy to fix-up code or configure linters - just lmk your preferences.

tox -e flake8

with this branch will warnapalooza at ya.

eyeonus commented 2 months ago

I'm at work right now, so just as a quick reply

I'm currently using Eclipse with PyDev, planning on switching over to PyCharm when I have the free time

angular is a commit commenting style that python-semantic-release uses to determine whether to increment the version number, whether to publish a release, automatically annotate the release with what changes occurred since last release, etc.

chore:, fix:, refactor: are examples

kfsone commented 2 months ago

Did not know about that, will adjust accordingly. Like it.

I was doing a burn-thru of warnings and errors, getting myself back to grips with the code and tooling - I hopefully got us closer to being able to use pylint which has saved my bacon hard several times, even though it's a pain sometimes.

Tomorrow I'll actually fix the errors - I think I know what they are already, and I wanted to have my ide running so I could step thru meaningfully and see it for sure :)

eyeonus commented 2 months ago

https://python-semantic-release.readthedocs.io/en/latest/commit-parsing.html

kfsone commented 2 months ago

I'll fix the commit messages in the morning. For now, I just checked in the fix that was causing the test failures.

kfsone commented 2 months ago

That should cleanup the descriptions.

eyeonus commented 2 months ago

Regarding the fdev_id in the Item table: We're using the fdev_id as the item_id nowadays, so it wouldn't be a bad idea to refactor the fdev_id away completely.

I believe the only things that use the fdev_id are my eddblink plugin, the spansh plugin, and the listener, which is a different repo.

eyeonus commented 2 months ago

On another note, that's a lot of work done while I was at work. Color me impressed. And thanks for cleaning up the commit messages.

eyeonus commented 2 months ago

Just a heads-up, someone wrote commandenv.colorize() and used it to color the output in tradecalc, not sure if that'll interfere with the rich coloring you've been doing.

kfsone commented 2 months ago

@eyeonus I saw that, and left it alone -- the colorizing is just automatic by writing through rich, and the colorize implementation in tradecalc could be replaced with use of the rich/themeing.

Also - the build failure above is failing on deploying to pypi which I would hope I'm guaranteed to fail on from my branch :)

Right now I'm looking at some ideas from the "1 billion rows challenge" and how people solved that with python. The first kick in the teeth was a nice simple one: it's way faster to process files in binary or even ascii than utf-8. in the code where we're already using "blocks()" to speed it up, you can make it 8x faster by opening the file in 'rb' and then counting b'\n'. I'm doing more tests.

Also - I saw the optimization to go back from building the list to periodically closing the database. Agh. We're caught here between just needing someone to store the data and using SQL because we have a database as our store.

The idea really was that people would go thru TradeDB directly, with sqlite being a fallback way to access the data if something was amiss or went wrong.

I wonder if we shouldn't just get rid of the DB back end entirely and just pickle the python data. That used to be slow but it's fast as fek these days, and it would mean one source of truth rather than two. We already make it so that during imports you build a -new- database with the changes, this wouldn't be any different.

It would mean that you couldn't inspect the data with sqlite or anything.

eyeonus commented 2 months ago

@eyeonus I saw that, and left it alone -- the colorizing is just automatic by writing through rich, and the colorize implementation in tradecalc could be replaced with use of the rich/themeing.

Also - the build failure above is failing on deploying to pypi which I would hope I'm guaranteed to fail on from my branch :)

Yup. That one only "succeeds" if the branch is release/v1, because obviously we don't want to publish releases otherwise

eyeonus commented 2 months ago

As far as the DB is concerned, everything in TD uses it, including TradeDB. The csv and prices files exist solely for rebuilding the DB if something goes wrong and it gets corrupted, lost, etc.

eyeonus commented 2 months ago

Also:

https://docs.python.org/3/library/pickle.html:

Warning

The pickle module is not secure. Only unpickle data you trust.

It is possible to construct malicious pickle data which will execute arbitrary code during unpickling. Never unpickle data that could have come from an untrusted source, or that could have been tampered with.

Consider signing data with hmac if you need to ensure that it has not been tampered with.

Safer serialization formats such as json may be more appropriate if you are processing untrusted data. See Comparison with json.

kfsone commented 2 months ago

Yeah, we wouldn't be replacing the exchange formats with pickle. It's just a super fast way to "freeze" a snapshot of python's memory representation of objects:

import pickle
with open("test.pkl", "wb") as fh:
  pickle.dump({"data": [1,2,3], "names": set(("you", "me"))}, fh)

restored = pickle.load(open("test.pkl", "rb"))

gives

>>> print(restored)
{'data': [1, 2, 3], 'names': {'me', 'you'}}

But I don't think that it does as good a job at save/restore for actual entire object trees (as opposed to lists/tables of complex objects) as we'd produce if System has a direct link to Station that we want, rather than just an id reference.

I took a look at what it would require to pull of sanely, and the work to describe the structures nicely and modern-pythonically brings you also the entire way towards what would need doing in order to describe the schema to SQLAlchemy, at which point when SQLite is a bottle neck, people could simply swap the backend engine

if getOption("use_mysql"):
  tradedb.engine = SqlAlchemy.Engines.MySQL  # or so
elif getOption("use_postgres"):
  tradedb.engine .. giant_turd
  ...

 class TradeDB:
   def get_db(self):
     if self.db: return self.conn
     self.db= engine.connect(...)

     sol_system = sqlalchemy.select(System).where(name="Sol")
     print("sol is at", sol.pos_x, sol.pos_y, sol.pos_z)
kfsone commented 2 months ago

image

eyeonus commented 2 months ago

Let me just say, from my attempts to reign in the import time on eddblink with what is now a 2.5GB file, that SQL is definitely a bottleneck. It runs pretty quickly for the first 22,000,000 entries or so, but once the DB is over 3GB in size, it slows way down.

I mean, it gets slightly slower the whole way through, (inserting item 20,000 takes slightly longer than inserting item 10,000) but the difference becomes noticeable around 50% and gets worse from there.

kfsone commented 2 months ago

speaking of colorized, I have this stash and will revist another time, but by reducing Progress to a simple facade to rich.Progress, it looks like this:

https://github.com/eyeonus/Trade-Dangerous/assets/323009/1d34e3f5-97af-46d5-b79b-c0f43802c4ef

eyeonus commented 2 months ago

Nice. Would that happen on everything that uses misc/progress, or is it specific to that?

kfsone commented 2 months ago

In the first pass it would all look kinda like that, and then a later pass can get rid of misc/progress.py and just use the rich one directly. But I'd want to visit each one first so I'm parking it for now :) I guess I should commit it to a branch rather than a stash. Too much time in perforce makes you afraid to branch, lol.

Back to this ticket itself: I'm pushing a final rebase and then I'm going to use it on both my mac and win box to do an eddblink import and a spansh import and see how that goes. I guess that'll take an hour?

I don't have ED installed on either machine at the moment, and I haven't played in so long I'm not sure what would be a good set of commands to flex to make sure we're actually good.

eyeonus commented 2 months ago

What kind of flexing are you looking for?

eyeonus commented 2 months ago

Are you talking about this kind of thing?

[eyeonus@panther Trade-Dangerous]$ trade run --capacity 400 --credits 500b --from sol --loop --ly-per 20 --no-planet --odyssey N --hops 4
NOTE: Pruned 459 origins too far from any end stations
Sol/Daedalus -> Sol/Daedalus
  Sol/Daedalus: 400 x Insulating Membrane,
  Wyrd/Bokeili Station: 183 x Bertrandite, 52 x Silver, 152 x Gallite, 12 x Gold, 1 x Palladium,
  Wolf 433/Ellis Orbital: 400 x Water Purifiers,
  Wyrd/Bokeili Station: 183 x Bertrandite, 153 x Gallite, 52 x Silver, 12 x Gold,
  Sol/Daedalus +18,398,126cr (11,498/ton)
Tromador commented 2 months ago

Not sure what you are wanting to test with your "flex". If you want to exercise TD, maybe give it a big complex run command.

Something like this trade run --from shin/jamesonm --cap 384 --cr 408015634 --insurance 1943578 --ly 37.27 --empty-ly=48.19 --margin 0.04 --hops 5 --jum 3 --unique -P -v

Will give it some pain as it will basically have to make a route that can go pretty much anywhere in the bubble for each hop, so it has to evaluate a stupid and ridiculous tree. I would never run that. In fact, that's something I did run, but with all the filters for age and pruning and what have you all removed.

If that's not what you want to do, you'll have to be more specific about what you particularly want to 'flex'.

kfsone commented 2 months ago

nothing heavy, just some routine commands I could run on each box to make sure the data didn't just get ingested but it also didn't get digested ;)

eyeonus commented 2 months ago

You could do the commands tox does to check nothing is broken?

eyeonus commented 2 months ago

Maybe ask around on the EDCD discord? I'm sure the people there would have lots of suggestions

kfsone commented 2 months ago

I haven't been able to test on the mac, it takes forever downloading the json file and without updates: the http headers tell us the gzip'd size but requests decompresses the file as it goes, and that .json file is, uhm, lets say very compressible? It's not compressing by a percentage, it's nearly compressing by an order of magnitude :)

eyeonus commented 2 months ago

Yeah, it'd be nice if the download had progress updates.

Also, yeah, 1.4 GB -> 8.9 GB is quite the compression.

kfsone commented 2 months ago

this will conflict after you merge the transfer-timeouts fix, but after that if you're ready to merge this one I think she's good.

eyeonus commented 2 months ago

Something got borked:

elite@quoth tradedangerous]$ trade buildcache -f
NOTE: Rebuilding cache file: this may take a few moments.
/home/elite/.local/bin/trade: /home/elite/tradedangerous/tddata/TradeDangerous.prices:19 ERROR Unrecognized line/syntax,
got: "Agronomic Treatment 5430 5568 ? 489H 2024-04-23 20:49:51". 

That's the first commodity in the file, and it has the correct format.

eyeonus commented 2 months ago

I'm fairly certain it's from the changes made to newItemPriceRe in cache.py

eyeonus commented 2 months ago

Yup, that was it. Reverting it fixed the problem.