Open khanken opened 4 months ago
Thanks for reporting this. We don't export all the tables, but we do figure it's useful to have a fairly complete schema. The ordering is definitely an issue. If that's something you're game to fix, we'd welcome that.
There's a PR from a few minutes ago that may have some of these fixes too: #4223.
I think it fixes the missing race table, and the missing schema files. The author mentioned the issue with the foreign keys being out of order, but I don't think their PR has the fix for that yet.
@khanken my MR should fix your first 3 issues, although I don't think it will help with the 4th. From what I can tell the load-bulk-data-2024-05-07.sh script does tables in order of how they are defined in the array, so I have ordered them in a way that shouldn't trigger any FK errors when the load-bulk-data-2024-05-07.sh script is run.
Thank you so much for the quick response! It is understandable you do not provide all the data.
I will be more than happy to fix the shell script if I I have all the schemas required to load the data. I am stuck loading search_sockets table. I had restarted the process with no success so far.
I was wondering if I needed a better computer for large tables. So I asked a friend to spin up a VM with RHEL 9 on his server yesterday, and I am planning to install postgres on his server and give it a try.
But I think he is tied up by the crow strike outrage. He texted me that he "had a busy day" at 7 AM this morning. I am guessing he is busy putting out fire? I don't know when I can get a new server to run this.
It is not a difficult fix at all. I could have fixed it by looking at the schemas. But you know how it goes with scripts. I would like to run everything successfully before commit and sharing the fix.
I am waiting on the hardware right now. Meanwhile I have to move on to other part of my project. I will share with you soon as I can get access to a better server for my database.
Thanks!
Kelly
On Fri, Jul 19, 2024, 09:55 Mike Lissner @.***> wrote:
Thanks for reporting this. We don't export all the tables, but we do figure it's useful to have a fairly complete schema. The ordering is definitely an issue. If that's something you're game to fix, we'd welcome that.
There's a PR from a few minutes ago that may have some of these fixes too:
4223 https://github.com/freelawproject/courtlistener/pull/4223.
I think it fixes the missing race table, and the missing schema files. The author mentioned the issue with the foreign keys being out of order, but I don't think their PR has the fix for that yet.
— Reply to this email directly, view it on GitHub https://github.com/freelawproject/courtlistener/issues/4222#issuecomment-2239402043, or unsubscribe https://github.com/notifications/unsubscribe-auth/A2Q3ULEPZGTBQYXZXKU52B3ZNESEJAVCNFSM6AAAAABLDYHR22VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMZZGQYDEMBUGM . You are receiving this because you authored the thread.Message ID: @.***>
Sorry, I am not able to access my personal computer during the day. I can only check emails. I did not see this email. Thank you so much for the fix! And I want you know that this project and all of you guys work are much appreciated!
P.S. What is the minimum hardware requirements on running this database?
On Fri, Jul 19, 2024, 10:51 hopperj @.***> wrote:
@khanken https://github.com/khanken my MR should fix your first 3 issues, although I don't think it will help with the 4th. From what I can tell the load-bulk-data-2024-05-07.sh script does tables in order of how they are defined in the array, so I have ordered them in a way that shouldn't trigger any FK errors when the load-bulk-data-2024-05-07.sh script is run.
— Reply to this email directly, view it on GitHub https://github.com/freelawproject/courtlistener/issues/4222#issuecomment-2239499151, or unsubscribe https://github.com/notifications/unsubscribe-auth/A2Q3ULCDFEWIE36VVBPABUDZNEYZXAVCNFSM6AAAAABLDYHR22VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMZZGQ4TSMJVGE . You are receiving this because you were mentioned.Message ID: @.***>
P.S. What is the minimum hardware requirements on running this database?
I think it's around 500GB, but honestly, we have lots of other stuff in our DB, so it's hard to say. It takes a big machine though.
I have been having issues loading large files. I am wondering if we could chunk the data to under 2G per file when exporting? It is not easy to chunk the CSV files. The rows might be broken in 2 separate files.
Or any suggestions on loading large files?
On Mon, Jul 22, 2024, 10:12 Mike Lissner @.***> wrote:
P.S. What is the minimum hardware requirements on running this database?
I think it's around 500GB, but honestly, we have lots of other stuff in our DB, so it's hard to say. It takes a big machine though.
— Reply to this email directly, view it on GitHub https://github.com/freelawproject/courtlistener/issues/4222#issuecomment-2243203539, or unsubscribe https://github.com/notifications/unsubscribe-auth/A2Q3ULCEE4XHZAI27ARYKADZNUONRAVCNFSM6AAAAABLDYHR22VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENBTGIYDGNJTHE . You are receiving this because you were mentioned.Message ID: @.***>
You can chunk on your side, if that's helpful. I think we'd prefer it that way.
load-bulk-data-2024-05-07.sh is not working for me:
Thank you so much for all you have done! I really appreciate it!