Closed Alex-At-Home closed 3 years ago
for i in $(find /Users/alex/websites/NCAA_by_conf -name "old.zip" | grep "/2020/"); do unzip -l $i | grep box_score | grep -E "\s+[0-9][0-9]\s+" && echo $i; done
is finding them, now to delete them!
Example deletion to automate:
zip -d /Users/alex/websites/NCAA_by_conf/american/2020/Memphis_505725/hts-cache/old.zip "https://stats.ncaa.org/contests/1992809/box_score"
and then same line but new.zip
(Some team's still have 23
length box scores even after delete and re-import)
Also there was a 1pt Galin -> Hart error in the original box score, trying to decide whether to re-import
Hacky script to remove all such box scores: for i in $(find /Users/alex/websites/NCAA_by_conf -name "*.zip" | grep "/2020/"); do j=$(unzip -l $i | grep box_score | grep -E "\s+[0-9][0-9]\s+" | grep -o 'https:.*') && echo "$i /// $j" && zip -d $i "$j"; done
Fixed with the commit above a while back
The Maryland - St Peter's game got given a "bad" box score/PbP URL (only got filled with data later)
And I didn't pick it up again because:
Search for the
23
examples (maybe 2 digit bytes) and fix manually and maybe add something to the import logic to redownload those automatically?