HackSandbox / hacksandbox-online-editor

Web IDE for HackSandbox
http://hacksandbox.com
4 stars 0 forks source link

Only store diff in database. #2

Open junzhengca opened 7 years ago

junzhengca commented 7 years ago

Currently we store the full source for each fork. Which is very very bad. Is there a way we could only store diff? Since we know the parent, we can evaluate diff and give user their code.

The problem with this is that if a fork is too far away from the base, performance will be an issue.

junzhengca commented 7 years ago

image On above graph, bold text indicates repository identifier, S - indicates commit identifier (save identifier), which only saves diff rather than the full repository.

Let's say we now want to get 128dbece at save 58791a. We have to reconstruct the code by going through: 8e0069cba(8a8d81) -> 8e0069cba(ab61f1) -> 87f294ed(5dc5e5) -> 87f294ed(1e5352) -> 93574bfc(7cbe7) -> 93574bfc(8c3ef) -> 93574bfc(7dea1) -> 93574bfc(5d3ea) -> 128dbece(75caeb) -> 128dbece(58791a)

Which can be very very slow if the tree gets large.

If you guys have time, can you think about a solution that can optimize this?

Don't say Git, it is slow, because it only saves the most recent copy. We must guarantee all save points can be reconstructed very fast.

junzhengca commented 7 years ago

I will implement the brute force method for now. But we have to optimize in the future for sure.