Alex-At-Home / cbb-on-off-analyzer

A not-so-simple-any-more! SPA for rendering college basketball on/off analysis
https://cbb-on-off-analyzer.now.sh
Apache License 2.0
0 stars 2 forks source link

Search caches for delayed game play by plays #138

Closed Alex-At-Home closed 3 years ago

Alex-At-Home commented 3 years ago

The Maryland - St Peter's game got given a "bad" box score/PbP URL (only got filled with data later)

And I didn't pick it up again because:

(No errors, 0 warnings, 0 messages)
  Length      Date    Time    Name
---------  ---------- -----   ----
    82404  11-28-2020 18:37   https://stats.ncaa.org/contests/1986957/box_score
    87275  11-28-2020 18:37   https://stats.ncaa.org/contests/1989471/box_score
    69868  11-30-2020 13:54   https://stats.ncaa.org/contests/1982828/box_score
       23  12-04-2020 23:35   https://stats.ncaa.org/contests/1993837/box_score
    82224  11-28-2020 18:37   https://stats.ncaa.org/game/box_score/4981922?period_no=1
   126297  11-28-2020 18:37   https://stats.ncaa.org/game/play_by_play/4981922
    87190  11-28-2020 18:37   https://stats.ncaa.org/game/box_score/4982345?period_no=1
   107707  11-28-2020 18:37   https://stats.ncaa.org/game/play_by_play/4982345
    69724  11-30-2020 13:54   https://stats.ncaa.org/game/box_score/4982774?period_no=1
   102019  11-30-2020 13:54   https://stats.ncaa.org/game/play_by_play/4982774
---------                     -------

Search for the 23 examples (maybe 2 digit bytes) and fix manually and maybe add something to the import logic to redownload those automatically?

Alex-At-Home commented 3 years ago

for i in $(find /Users/alex/websites/NCAA_by_conf -name "old.zip" | grep "/2020/"); do unzip -l $i | grep box_score | grep -E "\s+[0-9][0-9]\s+" && echo $i; done is finding them, now to delete them!

Alex-At-Home commented 3 years ago

Example deletion to automate:

zip -d /Users/alex/websites/NCAA_by_conf/american/2020/Memphis_505725/hts-cache/old.zip "https://stats.ncaa.org/contests/1992809/box_score"

and then same line but new.zip

Alex-At-Home commented 3 years ago

(Some team's still have 23 length box scores even after delete and re-import)

Also there was a 1pt Galin -> Hart error in the original box score, trying to decide whether to re-import

Alex-At-Home commented 3 years ago

Hacky script to remove all such box scores: for i in $(find /Users/alex/websites/NCAA_by_conf -name "*.zip" | grep "/2020/"); do j=$(unzip -l $i | grep box_score | grep -E "\s+[0-9][0-9]\s+" | grep -o 'https:.*') && echo "$i /// $j" && zip -d $i "$j"; done

Alex-At-Home commented 3 years ago

Fixed with the commit above a while back