Open csm opened 4 years ago
Very nice work! I have just tried to run the konserve tests, but I seem to need an S3 account (?).
Have you made any progress on this issue? We would really like to support S3 with datahike.
For testing the best option is probably s4, which I wrote for this exact use case (for extra fun s4 also uses konserve as its own backend, so you could possibly build a tower of systems each going through s3-via-konserve 😏 )
I was also recently experimenting with writing hitchhiker trees directly to S3 along with doing my own tx-log in dynamodb.
Also I split out the konserve part into it's own repo.
Yes, I have just seen this. Pretty cool. If konserve-ddb-s3
works then we would have effectively a general purpose hitchhiker-tree with S3 support, but I assume you have realized this. The roots of the tree are written in datahike explicitly to the same backend, but you could totally use dynamo DB for this. I still have to comprehend all the S3 bits your are juggeling in the konserve backend.
Btw. do you still have a handle on the clojurians slack?
s4 is super cool! :heart_eyes: Can you reproduce this issue with s4? That would maybe indicate it is a konserve issue independent of s3 internals.
Your konserve tests look like they have covered nesting and unnesting with update-in
and get-in
. It would be interesting to see how the Persistent map looks like that is mistakingly popping up. I think most likely is serializer issue. Btw. :+1: for the lz4 compression, I think that would be cool to have in konserve in general.
It's possible I fixed this issue in the various iterations of the code; unfortunately I don't remember if I still saw this issue in the last tests I ran.
So it's possible this issue just fell away while I was iterating on the implementation.
So this issue is not popping up for you when you run your tests? If I understand correctly to test it I have to set the proper credentials, e.g. through environment variables: https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/credentials.html ?
Look at this example script to start DynamoDB and S4 locally, storing data to the filesystem: https://github.com/csm/datahike-s3/blob/master/examples/mbrainz.clj. You don't need AWS credentials to use the local dynamodb/s3 servers.
That AWS docs link should work for specifying real AWS credentials for most cases; this uses the cognitect AWS client. I know that both the default credentials profile file and instance profile credentials work.
Ok, thanks for providing background. I seem to get an error on using reflection to mess with Unsafe and a region error, but I guess they are Java version related (probably I need an older JDK or something, I will check that tomorrow). https://pastebin.com/vVC25pmE
I started doing some further work in these repos:
:db
and :ops
keys in dynamodb, stores everything else in S3.The idea is that as ops are added to the index node, they are put in DynamoDB, avoiding churn of objects written to S3.
Something, somewhere seems to be turning a hitchhiker-tree node into a PersistentArrayMap, causing this after transacting a bunch of data:
I think the fressian serializer is set up correctly, so it's likely the logic in
-update-in
.