FOGProject / fog-docs

Documentation for FOG 1.X
GNU General Public License v3.0
17 stars 19 forks source link

Obtain local versions of Mediawiki files #18

Open darksidemilk opened 1 year ago

darksidemilk commented 1 year ago

To make it possible to do an initial conversion of what's in the wiki we can use a local version of pandoc to convert things.

We first need to find either a built in Mediawiki method to get the source files or build a web scraper to get that content

This would likely go in a to-convert folder along with converted md files that would gradually be vetted l, updated, and moved into their new homes

Sebastian-Roth commented 1 year ago

We first need to find either a built in Mediawiki method to get the source files or build a web scraper to get that content

I think I have done something similar in another project when moving from one wiki to another to make absolutely sure the contents were all there. Pulled it all down from both wikis and compared the contents.

I am sure I have the (python?) scripts ready somewhere. Will check it out and share it here.

Sebastian-Roth commented 1 year ago

@darksidemilk Give this a try: https://gist.github.com/Sebastian-Roth/4e660a35b5c5be751c7f459b9f161cb1 (should work out of the box)

darksidemilk commented 1 year ago

@Sebastian-Roth It started out great but hits a snag I get to this page and get this python error

title: Add & Extend a 2nd Virtual HDD, id: 4778, revs: 7
Traceback (most recent call last):
  File "..\fog-docs\wikiArchive\get-wikifiles.py", line 38, in <module>
    f.write(content)
  File "..AppData\Local\Programs\Python\Python311\Lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

I tried a couple python 3 revisions, do you know if it matters what version of python is used? Maybe there's just something off in one of the revisions of that file?

darksidemilk commented 1 year ago

I added a try/catch

with open(fn, 'w') as f:
                    try:
                        f.write(content)
                    except:
                        print(f"write failed")

That got it to continue through.

darksidemilk commented 1 year ago

I think I might have lost one file in my filtering to first rev only, but I'm not 100% sure. My powershell filtering showed 318 unique names once I removed all the _rev## strings. But there are 317 files in the wikiArchive folder after I filtered it. I'll try to figure out what may have been lost.

Nevermind, it was the python script that got deleted. We have all the wiki files local to the repo now. We don't have all the images but this will still help a lot