dosyago / dn

πŸ’Ύ dn - offline full-text search and archiving for your Chromium-based browser.
https://localhost:22120
Other
3.78k stars 145 forks source link

Type error upon loading - index.json getting clobbered #103

Closed marcboivin closed 3 months ago

marcboivin commented 2 years ago

Great idea, I was dumping as much stuff as I could into it because it solves a real problem I have.

Thing is, I realized my index didn't have everything in it. Looked at the archive folder all the pages were there.

Tried to open a new session started re-indexing content. Still same issue, not everything was showing in the index.

Started a 3rd time, got :

TypeError: Cannot destructure property 'id' of 'Mo.Index.get(...)' as it is undefined.
    at /Users/mboivin/.nvm/versions/node/v16.13.1/lib/node_modules/diskernet/build/22120.js:3:8330661
    at Array.map (<anonymous>)
    at n.flex (/Users/mboivin/.nvm/versions/node/v16.13.1/lib/node_modules/diskernet/build/22120.js:3:8330639)
    at Object.search (/Users/mboivin/.nvm/versions/node/v16.13.1/lib/node_modules/diskernet/build/22120.js:3:8330865)
    at async /Users/mboivin/.nvm/versions/node/v16.13.1/lib/node_modules/diskernet/build/22120.js:3:8357152

Now I can't use the tool and my indexed content is unusable.

Attached is the MASSIVE error log I got from trying to restart diskernet.

Anyway to solve this? out.log

Thanks

o0101 commented 2 years ago

Thank you! I'm really sorry about this issue. I've seen it as well.

I still have not isolated the cause.

Basically what has happened is the index.json file has been clobbered.

So all your cached resources are still there, and I believe the cache.json file should still be OK.

This is a really terrible thing to happen to your index, I'm sorry!

I don't have a solution right now but I believe it may be possible to rebuild the index.json file and recover it.

A patch I'm intending to release will keep a backup index.json and recover it if it gets clobbered. As well as adding a check before any writes that we are not overwriting an existing one, and to, in any case, save out the existing one, to the backup before writing.

I still can't isolate where Index.json is overwritten with an empty copy, as there are only a couple of places this occurs.

o0101 commented 2 years ago

Thanks again for the report @marcboivin ! I really appreciate it and I'm very sorry for you that this happened 😒

o0101 commented 2 years ago

I just checked out the out.log -- that is an impressively long error isn't it πŸ˜‚ πŸ˜†

It's basically just dumped the entirety of the bundled JavaScript for the entire project out of the executable. I'm still not sure why that happens on crash -- it used to happen with nexe and still occurs with pkg.

I think it is happening because it's trying to output the line where the error occurred. but of course the built JS is all one single "line" (8 Mb long...)

Anyway, this is not the cause of the crash / corruption

marcboivin commented 2 years ago

(Edit: corrected typo)

I can confirm the cache looks intact.

I could rebuild the index. Don't mind trying at least.

Pretty sure it's looking for an array indice that doesn't exist because my index.json is nothing like I would expect it to be


[
  [
    "http://www.lockwiki.com/index.php/Main_Page",
    {
      "date": 1641520105000,
      "id": 4,
      "ndx_id": 1000016,
      "title": "Lockwiki"
    }
  ],
  [
    4,
    "http://www.lockwiki.com/index.php/Main_Page"
  ],
  [
    "http://bjoernkarmann.dk/project_alias",
    {
      "date": 1641520105765,
      "id": 6,
      "ndx_id": 1000017,
      "title": "BjΓΈrn Karmann β€Ί project_alias"
    }
  ],
  [
    "https://playbook.cio.gov/?utm_content=buffere045d&utm_medium=social&utm_source=linkedin.com&utm_campaign=buffer",
    {
      "date": 1641520104911,
      "id": 5,
      "ndx_id": 1000015,
      "title": "The Digital Services Playbook β€” from the U.S. Digital Service"
    }
  ],
  [
    "ndx1000003",
    "http://www.lockwiki.com/index.php/Main_Page"
  ],
  [
    5,
    "https://playbook.cio.gov/?utm_content=buffere045d&utm_medium=social&utm_source=linkedin.com&utm_campaign=buffer"
  ],
  [
    "ndx1000004",
    "https://playbook.cio.gov/?utm_content=buffere045d&utm_medium=social&utm_source=linkedin.com&utm_campaign=buffer"
  ],
  [
    6,
    "http://bjoernkarmann.dk/project_alias"
  ],
  [
    "ndx1000005",
    "http://bjoernkarmann.dk/project_alias"
  ],
  [
    "ndx1000006",
    "https://playbook.cio.gov/?utm_content=buffere045d&utm_medium=social&utm_source=linkedin.com&utm_campaign=buffer"
  ],
  [
    "ndx1000007",
    "http://www.lockwiki.com/index.php/Main_Page"
  ],
  [
    "ndx1000008",
    "https://playbook.cio.gov/?utm_content=buffere045d&utm_medium=social&utm_source=linkedin.com&utm_campaign=buffer"
  ],
  [
    "ndx1000009",
    "http://bjoernkarmann.dk/project_alias"
  ],
  [
    "ndx1000010",
    "http://bjoernkarmann.dk/project_alias"
  ],
  [
    "ndx1000011",
    "http://www.lockwiki.com/index.php/Main_Page"
  ],
  [
    "ndx1000012",
    "https://playbook.cio.gov/?utm_content=buffere045d&utm_medium=social&utm_source=linkedin.com&utm_campaign=buffer"
  ],
  [
    "ndx1000013",
    "http://www.lockwiki.com/index.php/Main_Page"
  ],
  [
    "ndx1000014",
    "http://bjoernkarmann.dk/project_alias"
  ],
  [
    "ndx1000015",
    "https://playbook.cio.gov/?utm_content=buffere045d&utm_medium=social&utm_source=linkedin.com&utm_campaign=buffer"
  ],
  [
    "ndx1000016",
    "http://www.lockwiki.com/index.php/Main_Page"
  ],
  [
    "ndx1000017",
    "http://bjoernkarmann.dk/project_alias"
  ]
]
o0101 commented 2 years ago

That's awesome, how did you rebuild the index??

marcboivin commented 2 years ago

By hand my good sir.

Backed up the public folder and with a bit of ls and grep magic figured out all the URLs I wanted to index and revisited them ;)

Doing that I found out some of the archive was still not processed and diskermet asked if I wanted to recover. I did and got some back.

So something tells me the process fails at some point but we have no way of knowing when.

Also I noted that one folder was corrupted and Finder (I'm on a Mac) would open the folder.

marcboivin commented 2 years ago

If you're curious, I used this as a starting point

grep -r GEThttp ./ | cut -d ':' -f 4 | cut -d '?' -f 1

o0101 commented 2 years ago

By hand my good sir.

Backed up the public folder and with a bit of ls and grep magic figured out all the URLs I wanted to index and revisited them ;)

Doing that I found out some of the archive was still not processed and diskermet asked if I wanted to recover. I did and got some back.

So something tells me the process fails at some point but we have no way of knowing when.

Also I noted that one folder was corrupted and Finder (I'm on a Mac) would open the folder.

You're awesome! That's so good! πŸ˜† πŸ˜‚ ✊🏻 !!