andrewbanchich / shreddit

Delete your Reddit data.
MIT License
247 stars 11 forks source link

Error with numeric subreddits in GDPR mode #80

Open timmc opened 1 year ago

timmc commented 1 year ago

shreddit reliably chokes on parsing comments.csv when the subreddit field is all-numeric, e.g. for /r/404 or /r/2012. For example, if I change ...,technology,... to ...,2012,... in the first record of my comments.csv in the GDPR export, I get the following error:

  2023-07-08T21:34:11.153874Z  INFO  Shredding Comments...
    at src/main.rs:52

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Error(Deserialize { pos: Some(Position { byte: 63, line: 1, record: 1 }), err: DeserializeError { field: None, kind: Message("data did not match any variant of untagged enum Source") } })', src/sources/gdpr.rs:17:41
stack backtrace:
   0: rust_begin_unwind
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/panicking.rs:578:5
   1: core::panicking::panic_fmt
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/panicking.rs:67:14
   2: core::result::unwrap_failed
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/result.rs:1687:5
   3: core::result::Result<T,E>::unwrap
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/result.rs:1089:23
   4: shreddit::sources::gdpr::list::{{closure}}
             at ./src/sources/gdpr.rs:17:39
   5: core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &mut F>::call_once
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/ops/function.rs:310:13
...

Maybe the csv Reader is producing a numeric type when there's an all-numeric sequence, and then deserialize fails because the types don't match?

My current workaround is to remove the subreddit: String field from the Gdpr enum variant in comment.rs, as it's not currently used for anything. Removing the offending lines from the .csv should also work.

timmc commented 1 year ago

Note that this problem doesn't happen with the id field, which can also be all-numeric and is also supposed to be a String. This might have something to do with serde and enums.