ffdev-info / wikidp-issues

An issues repository for resolving issues in Wikidata around the records relating to Digital Preservation
GNU General Public License v3.0
1 stars 0 forks source link

Wikidata skeleton file count is too high #3

Closed ross-spencer closed 3 years ago

ross-spencer commented 4 years ago

Description of problem

We are expecting fewer skeleton files to be output by Roy at runtime. The log reports 96 but we're getting 124. It might just be I am looping through some values incorrectly (if so likely outputting files that won't be identified). We should find out.

More information

ross-spencer@siegfried:~/git/richardlehane/siegfried/cmd/sf$ cd wikidata-skeleton-suite-demo-v1.0/

ross-spencer@siegfried:~/git/richardlehane/siegfried/cmd/sf/wikidata-skeleton-suite-demo-v1.0$ ls
Q10287816.gz    Q1326659.vhdx    Q194831.cab     Q261907.xcf     Q27967488.flv       Q29000585.dex  Q4039139.gho   Q55429220.pyc  Q55429627.pdf      Q921122.jar
Q1047541.eps    Q1341482.exr     Q201093.ra      Q2627217.dylib  Q27979531.plist     Q29000599.lnk  Q42332.pdf     Q55429234.pyc  Q5974466           Q931783.jpg2
Q1056154.m4v    Q1343830.axf     Q2193155.class  Q2693033.arj    Q28048413.cin       Q29167848.dbx  Q42591.mp3     Q55429245.pyc  Q61762755          Q939636.cdr
Q1072083.wmf    Q1381134.mif     Q2332937.wpl    Q27229565.png   Q28205479.info      Q29168314.mar  Q4746193.adz   Q55429254.pyc  Q61762868          Q942350.qt
Q1076355.ocx    Q1428303.fits    Q24834502.ppt   Q27229608.png   Q28205771.tif       Q295711.ani    Q47462053.sig  Q55429271.pyc  Q61762936
Q1093556.xar    Q1569639.iff     Q26085317.pdf   Q27229642.png   Q28206109.ff        Q29650308.prc  Q47462074      Q55429287.pyc  Q61762985
Q11241282.rpm   Q162839.xz       Q26085319.pdf   Q27863188.adts  Q28206114.fbm       Q2997216.caf   Q47524710      Q55429299.pyc  Q61998186.feather
Q1143961.jbig2  Q1676669.jpe     Q26085322.pdf   Q27866048.cr2   Q28206162.ximg      Q309440.webm   Q475488.epub   Q55429313.pyc  Q672985.snd
Q1144005.hlp    Q178051.png      Q26085326.pdf   Q27866052.bz2   Q28206695.pgm       Q336316.mp4    Q5013743.cpt   Q55429332.pyc  Q751800.tte
Q1196805        Q18413771.woff2  Q26085330.pdf   Q27881556.flac  Q28206822.pspbrush  Q35221401.rar  Q50288190.pyc  Q55429341.pyc  Q85836636.evy
Q1228757.dmg    Q18640977.bpg    Q26085333.pdf   Q27966964.4xm   Q283579.tar         Q35221946.rar  Q5156830       Q55429354.pyc  Q877050.iso
Q1228770.dms    Q188199.oga      Q26085336.pdf   Q27967410.voc   Q28777700.mar       Q3651247.crw   Q5381415.evy   Q55429372.pyc  Q913946.nsf
Q1238229.stl    Q1893311.mxf     Q26085339.pdf   Q27967444.fli   Q28858032.doc       Q368782.lha    Q55429163.pyc  Q55429382.pyc  Q918221.woff

ross-spencer@siegfried:~/git/richardlehane/siegfried/cmd/sf/wikidata-skeleton-suite-demo-v1.0$ ls -la | wc -l
124
ross-spencer commented 3 years ago

Skeleton output begins here at line 219. The issue I believe comes about through a fairly typical mistake (for me at least) where I'm trying to modify a data structure we're looping over - by deleting values from the loop. But because we're also trying to output values based on that data structure within the same loop the values are never deleted by the time we arrive at those decision points - i.e. the values were, and still remain in memory until we've exited the context and they're truly not there. Not hugely important, and rectified for our tests. This isn't a feature that will be available in the final cut of this work. It was purely a convenience.