Closed griff-rees closed 1 year ago
Removed (but not purged) fixture-files/mitchells_db [v1].csv
Upon review, I don't think we need to remove the census data. It is available open-access through the UK Data Service… I believe we are able to re-share it (I wouldn’t have added it to the repo otherwise), and upon revisiting CC BY 4.0, it states that we “are free to . . . copy and redistribute the material in any medium or format” (see here).
Looping in @claireaustin01 might be good regarding this bit, however.
Great thanks @kallewesterling. Perhaps the safest option would be to automatically download that link in a local deploy? Arguably that's applicable to many of these.
A potential structure for managing the workflow, where data
folders include csv
etc. files and fixtures
the generated json
for the respective models:
newspapers
├── data
├── fixtures
mitchels
├── data
├── fixtures
gazetteer
├── data
├── fixtures
census
├── data
└── fixtures
Great thanks @kallewesterling. Perhaps the safest option would be to automatically download that link in a local deploy? Arguably that's applicable to many of these.
Sounds like a good idea to me. As far as I can see, it would apply to the two publicly available datasets that are used here (if we're sticking with keeping census data in there for now):
The scary thing about download files is obviously that the link are depending on services that provide them, long term etc. etc... You know all this, of course! :)
Well done, I was having a quick peak at those links and annoyed to figure out the js
involved, thanks for sorting that.
The scary thing about download files is obviously that the link are depending on services that provide them, long term etc. etc... You know all this, of course! :)
Yeah it's hard to maintain. I guess I'm thinking: maybe that addresses that concern for now, and we can return to the issue of having a final version of these included in the repository when we've had enough time to decide what's ok.
Any thoughts on this all much appreciated @claireaustin01
I agree with that @griff-rees !
Hi @griff-rees, @kallewesterling, @claireaustin01,
The following files in this folder contain data from Wikidata and Geonames:
Wikidata: according to https://dumps.wikimedia.org/legal.html:
Copyrights of structured data in the main, Property, Lexeme, and EntitySchema namespaces are waived using the Creative Commons Zero (CC0) public domain dedication. All unstructured content in other namespaces is licensed under the Creative Commons Attribution-Share-Alike 3.0 License.
Geonames: according to http://download.geonames.org/export/dump/:
This work is licensed under a Creative Commons Attribution 4.0 License, see https://creativecommons.org/licenses/by/4.0/ The Data is provided "as is" without warranty or any representation of accuracy, timeliness or completeness.
So, as far as I can see, it should be fine.
Have backed up all the fixture files. First attempt to purge via https://rtyley.github.io/bfg-repo-cleaner/ has raised the following errors:
$ git push
Enumerating objects: 43, done.
Counting objects: 100% (40/40), done.
Delta compression using up to 4 threads
Compressing objects: 100% (15/15), done.
Writing objects: 100% (24/24), 16.97 KiB | 8.48 MiB/s, done.
Total 24 (delta 18), reused 15 (delta 9), pack-reused 0
remote: Resolving deltas: 100% (18/18), completed with 9 local objects.
To github.com:Living-with-machines/lwmdb
! [remote rejected] refs/pull/101/head -> refs/pull/101/head (deny updating a hidden ref)
! [remote rejected] refs/pull/102/head -> refs/pull/102/head (deny updating a hidden ref)
! [remote rejected] refs/pull/107/head -> refs/pull/107/head (deny updating a hidden ref)
! [remote rejected] refs/pull/107/merge -> refs/pull/107/merge (deny updating a hidden ref)
! [remote rejected] refs/pull/11/head -> refs/pull/11/head (deny updating a hidden ref)
! [remote rejected] refs/pull/12/head -> refs/pull/12/head (deny updating a hidden ref)
! [remote rejected] refs/pull/13/head -> refs/pull/13/head (deny updating a hidden ref)
! [remote rejected] refs/pull/15/head -> refs/pull/15/head (deny updating a hidden ref)
! [remote rejected] refs/pull/18/head -> refs/pull/18/head (deny updating a hidden ref)
! [remote rejected] refs/pull/19/head -> refs/pull/19/head (deny updating a hidden ref)
! [remote rejected] refs/pull/2/head -> refs/pull/2/head (deny updating a hidden ref)
! [remote rejected] refs/pull/20/head -> refs/pull/20/head (deny updating a hidden ref)
! [remote rejected] refs/pull/27/head -> refs/pull/27/head (deny updating a hidden ref)
! [remote rejected] refs/pull/28/head -> refs/pull/28/head (deny updating a hidden ref)
! [remote rejected] refs/pull/30/head -> refs/pull/30/head (deny updating a hidden ref)
! [remote rejected] refs/pull/33/head -> refs/pull/33/head (deny updating a hidden ref)
! [remote rejected] refs/pull/38/head -> refs/pull/38/head (deny updating a hidden ref)
! [remote rejected] refs/pull/39/head -> refs/pull/39/head (deny updating a hidden ref)
! [remote rejected] refs/pull/40/head -> refs/pull/40/head (deny updating a hidden ref)
! [remote rejected] refs/pull/41/head -> refs/pull/41/head (deny updating a hidden ref)
! [remote rejected] refs/pull/42/head -> refs/pull/42/head (deny updating a hidden ref)
! [remote rejected] refs/pull/43/head -> refs/pull/43/head (deny updating a hidden ref)
! [remote rejected] refs/pull/44/head -> refs/pull/44/head (deny updating a hidden ref)
! [remote rejected] refs/pull/46/head -> refs/pull/46/head (deny updating a hidden ref)
! [remote rejected] refs/pull/5/head -> refs/pull/5/head (deny updating a hidden ref)
! [remote rejected] refs/pull/57/head -> refs/pull/57/head (deny updating a hidden ref)
! [remote rejected] refs/pull/58/head -> refs/pull/58/head (deny updating a hidden ref)
! [remote rejected] refs/pull/59/head -> refs/pull/59/head (deny updating a hidden ref)
! [remote rejected] refs/pull/62/head -> refs/pull/62/head (deny updating a hidden ref)
! [remote rejected] refs/pull/63/head -> refs/pull/63/head (deny updating a hidden ref)
! [remote rejected] refs/pull/67/head -> refs/pull/67/head (deny updating a hidden ref)
! [remote rejected] refs/pull/68/head -> refs/pull/68/head (deny updating a hidden ref)
! [remote rejected] refs/pull/69/head -> refs/pull/69/head (deny updating a hidden ref)
! [remote rejected] refs/pull/7/head -> refs/pull/7/head (deny updating a hidden ref)
! [remote rejected] refs/pull/72/head -> refs/pull/72/head (deny updating a hidden ref)
! [remote rejected] refs/pull/73/head -> refs/pull/73/head (deny updating a hidden ref)
! [remote rejected] refs/pull/74/head -> refs/pull/74/head (deny updating a hidden ref)
! [remote rejected] refs/pull/77/head -> refs/pull/77/head (deny updating a hidden ref)
! [remote rejected] refs/pull/78/head -> refs/pull/78/head (deny updating a hidden ref)
! [remote rejected] refs/pull/8/head -> refs/pull/8/head (deny updating a hidden ref)
! [remote rejected] refs/pull/85/head -> refs/pull/85/head (deny updating a hidden ref)
error: failed to push some refs to 'github.com:Living-with-machines/lwmdb'
This looks like a good place to start troubleshooting... It looks like it might be an issue with dropping files in a repo with open pull requests :/
@griff-rees do you have. the commands you tried with bfg just so I don't re do exactly what you tried
Thanks @AoifeHughes pretty sure this is what I found best:
$ bfg --delete-files fixture-files lwmdb.git
For reference: I installed bfg
via:
$ sudo snap install bfg-repo-cleaner --beta
on an azure
vm
Just tried it with slightly different command:
(playground) ➜ erase git clone git@github.com:Living-with-machines/lwmdb.git
Cloning into 'lwmdb'...
remote: Enumerating objects: 2319, done.
remote: Counting objects: 100% (351/351), done.
remote: Compressing objects: 100% (263/263), done.
remote: Total 2319 (delta 135), reused 167 (delta 82), pack-reused 1968
Receiving objects: 100% (2319/2319), 29.95 MiB | 4.80 MiB/s, done.
Resolving deltas: 100% (1358/1358), done.
(playground) ➜ erase cd lwmdb
(playground) ➜ lwmdb git:(main) java -jar ~/Downloads/bfg-1.14.0.jar --delete-folders fixture-files --delete-files fixture-files --private
Using repo : /Users/ahughes/erase/lwmdb/.git
Found 134 objects to protect
Found 17 commit-pointing refs : HEAD, refs/heads/main, refs/remotes/origin/HEAD, ...
Protected commits
-----------------
These are your protected commits, and so their contents will NOT be altered:
* commit 63f18ff4 (protected by 'HEAD') - contains 17 dirty files :
- fixture-files/JISC papers.csv (14.2 KB)
- fixture-files/UKDA-8613-csv/1851_rsd_data.csv (1.4 MB)
- ...
WARNING: The dirty content above may be removed from other commits, but as
the *protected* commits still use it, it will STILL exist in your repository.
Details of protected dirty content have been recorded here :
/Users/ahughes/erase/lwmdb.bfg-report/2023-06-30/11-23-15/protected-dirt/
If you *really* want this content gone, make a manual commit that removes it,
and then run the BFG on a fresh copy of your repo.
Cleaning
--------
Found 370 commits
Cleaning commits: 100% (370/370)
Cleaning commits completed in 163 ms.
Updating 13 Refs
----------------
Ref Before After
--------------------------------------------------------------------
refs/heads/main | 63f18ff4 | a1649c52
refs/remotes/origin/asmith-review-docs | e8196742 | d8a0bed9
refs/remotes/origin/fix-mitchells-import | c9032006 | 9dc8c58b
refs/remotes/origin/geocensus | dd31fd0f | 5bf21c44
refs/remotes/origin/improve-load-json-fixtures | 513738d3 | 56e47072
refs/remotes/origin/item-max-title-field | 6339b3e3 | b9e2e8c9
refs/remotes/origin/jupyterhub | 9e716305 | 6d7cd451
refs/remotes/origin/kallewesterling/issue35 | c8429d77 | aec87a1c
refs/remotes/origin/kallewesterling/issue56 | ebf57d41 | 6e04d95a
refs/remotes/origin/main | 63f18ff4 | a1649c52
refs/remotes/origin/mkdocs | 29b13aec | f8d69bfb
refs/remotes/origin/production-deploy | 738bfbab | dc84a5de
refs/remotes/origin/thobson/issue47 | 0fed749d | 31999d4d
Updating references: 100% (13/13)
...Ref update completed in 30 ms.
Commit Tree-Dirt History
------------------------
Earliest Latest
| |
......................DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD
D = dirty commits (file tree fixed)
m = modified commits (commit message or parents changed)
. = clean commits (no changes to file tree)
Before After
-------------------------------------------
First modified commit | ce708d9f | e16706f4
Last dirty commit | c9032006 | 9dc8c58b
In total, 489 object ids were changed. Full details are logged here:
/Users/ahughes/erase/lwmdb.bfg-report/2023-06-30/11-23-15
BFG run is complete! When ready, run: git reflog expire --expire=now --all && git gc --prune=now --aggressive
(playground) ➜ lwmdb git:(main) git reflog expire --expire=now --all && git gc --prune=now --aggressive
Enumerating objects: 2299, done.
Counting objects: 100% (2299/2299), done.
Delta compression using up to 10 threads
Compressing objects: 100% (2184/2184), done.
Writing objects: 100% (2299/2299), done.
Total 2299 (delta 1400), reused 589 (delta 0), pack-reused 0
I don't have permissions to write, but does this look like what you had @griff-rees I used the jar file directly from linked site.
Cool! I think I got that far, it was the push to main
that failed
I need to sort your permission. And I'm going to make another merge to main
, so it'll be one more checkout then have another go.
https://github.com/rtyley/bfg-repo-cleaner/issues/36#issuecomment-460922708 - see this comment
Yeah I saw that when I hit this before. Had other urgent stuff so left it
@AoifeHughes you've got admin
rights. With great power... ;)
Okay, just for reference I got the same errors as @griff-rees, I tried removing branch protections and also git push -f --set-upstream origin main
couldn't get it to budge
Thanks so @AoifeHughes: really helps to reproduce that (and know I didn't miss something obvious!). There are other routes that don't use bfg
... but they're hard.
Another option: https://github.com/newren/git-filter-repo
@griff-rees can you check if this has been done, I think I got it working?
git-filter-repo --invert-paths --path fixture-files
was used for this FYI
Ah lovely! I think we need to check the history to be sure. Probably need to add to .gitignore
to be safe, but I think the hardest part's done. Lovely, lovely work.
closing as data is gone 😄
This may require purging the
git
history and worth checking with @claireaustin01