C0D3D3V / Moodle-Downloader

A Moodle Crawler that downloads course content from Moodle (eg. lecture pdfs)
GNU General Public License v3.0
28 stars 4 forks source link

HTML Dateien werden öfters heruntergeladen #5

Closed C0D3D3V closed 6 years ago

C0D3D3V commented 6 years ago

Vorschlag nur bei wichtigen Änderungen die HTML Datei neu laden oder HTML Dateien komplett ignorieren.

C0D3D3V commented 6 years ago

Selbst identische Html Dateien werden öfters heruntergeladen

C0D3D3V commented 6 years ago

first step done https://github.com/C0D3D3V/Moodle-Crawler/tree/issue-5

C0D3D3V commented 6 years ago

only partially solved

C0D3D3V commented 6 years ago

New idea: make a text diff of html files. only if the hash of the text is different recrawl it ... Additionally save the hash in the log (with path) to not recrawl it if the file was moved See https://github.com/C0D3D3V/Moodle-Crawler/tree/issue-5b for progress