c-blake / adix

An Adaptive Index Library for Nim
ISC License
38 stars 2 forks source link

Namespacing behind adix/metab #1

Closed mratsim closed 4 years ago

mratsim commented 4 years ago

Currently to import adix when installed from nimble you need to use import metab, it would be easier to trackdown the adix dependency in a big codebase if the import was import adix/metab

This can be changed by stuffing everything in an adix subfolder.

c-blake commented 4 years ago

Coincidentally (or not?) just this morning I was breaking the Nim CI with an internal import cligen/foo that I had to change to ./foo .

But I take your point. Personally, I always get annoyed by unnecessary hierarchy from weak tools and sometimes act against that without thinking as I did making this repo. It is, after all, already in a subdirectory called "adix/" and for reasons unknown nimble seems oriented toward not using that level. I realize git clone does not enforce that. This seems an inadequate reason to me.

More annoying still, I suppose to not be a more cooperative nimble dependency :-), but I'll work on that tomorrow, though. It's about quit work time for me. Thanks for the report.

For now you could just put the ".." dir of wherever "adix/" is in your module search path and import it how you like, but you probably know that.

mratsim commented 4 years ago

for reasons unknown nimble seems oriented toward not using that level.

It's even worse when using src, nimble will flatten it as well.

Anyway thanks! FYI, I'm planning to give it a test run for MonteCarlo simulation for the Go Playing bot that I set aside over 2 years ago (https://github.com/mratsim/golem-prime/blob/8953347e/src/montecarlo/mc_datatypes.nim#L5-L26) though I'm not sure if it'll stay. I'm still hesitating between multithreading the DAG updates via message passing, in which case Adix is good or something that either requires a concurrent hash-table or DAG nodes backed by a thread-safe memory pool.

c-blake commented 4 years ago

Should be ok to do import adix/metab now. I'll let you close the issue after you have confirmed.

Cool project. Message passing with CSP is usually the "easy way" (see the whole Goroutine manifesto) as long as there is "enough work" behind each message. That said, I eventually think we can do an efficient lock-free concurrent hash with Robin Hood. That's probably better than Cuckoo hashing (at least in software) since you can stick to just 1 DRAM hit worst case. I just haven't done it yet. Lock free linear probing is another option probably fine with reliable hash functions. { I should finish the B-Tree first, though. All the core logic is there and works...I just need to do some kind of abstract allocation setup and provide some examples. I've mostly been waiting for feedback on that and/or procrastinating for a month. :-) Well, that and I should do a BST as a better "fixed size" B-Tree node. One can instead do a nested B-Tree to scale edit ops (indel) on large nodes (M>24..32) when moveMem becomes a drag, but because B-Trees are sort of somewhere between 50% and 100% that introduces another "maybe not everything is filled" factor: "in" the outer nodes as well as "over" the outer nodes. 25..100% space utilization is a lamely large range. }

Off topic side-comment inspired by your NimConf20 talk, if you are ever developing your own finicky search structure then something like tests/ppss.nim & tests/btshell.nim feels like the best setup almost in some absolute sense. You can use it as a playground like a Unix shell via $ ppss|btshell to add/del/whatever and print out the tree interactively or generate tests just with (for i in {1..200}; do echo A1 $i 2; done; echo D1 2; echo c; echo p) | ppss | btshell -c or pre-generate a benchmark workload with {..whatever..}| ppss > input; btshell < input, even including little commands to try to adjust for the (actually quite small) dispatch overhead (to the extent that is even meaningful, what with overlapping CPU pipelines & such). Pre-parsed binary input rules. :-) I don't think this technique is taught in data structures courses, but it likely should be..Never tried it with "real students". :-)

c-blake commented 4 years ago

Oh, also in terms of both "set aside projects from 2 years ago" and B-Trees (and mentioned mostly just in case Status has become boring and/or you somehow rapidly lose interest in your Go project and wanted to switch to another), I am pretty sure that a BIST switching over to a B-Tree at giant sizes is the fastest way to do order statistics over moving data windows and/or sampling without-replacement in the way that scales "from tiny to massive" dynamic collection sizes.

BIST/Fenwick trees will start being slower (prefetching or not) than B-Trees once the number of bins starts being deep out of cache where 2-way branching starts inducing DRAM hits. If you use the tricks of adix/sequint.nim and pre-fetch both/4 sides of the 2-way branching you might be able to extend the range of BIST superiority by 4x or so, but once you get up to "GigaItems" hope will still be lost (with modern cache sizes, anyway).

I mentioned some of this in a private correspondence back on Mar 23, I'm not sure you saw. Anyway, I think you & I have a variety of common interests and should probably talk more.