JosephLai241 / URS

Universal Reddit Scraper - A comprehensive Reddit scraping/archival command-line tool.
https://josephlai241.github.io/URS/
MIT License
809 stars 108 forks source link

Fixed '[Errno 36] File name too long' issue making it impossible to save comment scrapes with long titles. #19

Closed LukeDSchenk closed 4 years ago

LukeDSchenk commented 4 years ago

Overview

Summary

Added _check_len() function to the Export.NameFile() class to ensure generated filenames are not too long (and thus causing an error when trying to write scrapes to files). Added _check_len() call to Subreddit, Redditor, and comments scraping. These changes should prevent scrapes from failing due to overly long generated filenames.

Motivation/Context

When trying to use the comment scrape option, scrapes are automatically written to a file which includes the title of the comment thread in the filename. In a case where a thread title is rather long (140 chars+) it is possible that the overly long filename will cause an error and the scrape will fail to write to the designated file. In a nutshell, when scraping comment threads with long titles you would sit and wait for it to finish only to find out that your data was lost due to a bad filename :).

New Dependencies

None

Issue Fix or Enhancement Request

Not applicable

Type of Change

Breaking Change

Not applicable (I have included some scrape logs for reference anyways)

List All Changes That Have Been Made

How Has This Been Tested?

Test Configuration

Dependencies

astroid==2.4.1

attrs==19.3.0

certifi==2020.4.5.1

chardet==3.0.4

colorama==0.4.3

coverage==5.1

idna==2.9

isort==4.3.21

lazy-object-proxy==1.4.3

mccabe==0.6.1

more-itertools==8.3.0

packaging==20.4

pluggy==0.13.1

praw==7.0.0

prawcore==1.3.0

prettytable==0.7.2

py==1.8.1

pylint==2.5.2

pyparsing==2.4.7

pytest==5.4.3

pytest-cov==2.10.0

requests==2.23.0

six==1.14.0

toml==0.10.0

update-checker==0.17

urllib3==1.25.9

wcwidth==0.2.4

websocket-client==0.57.0

wrapt==1.12.1

Checklist

Tip: You can check off items by writing an "x" in the brackets, e.g. [x].

codecov-commenter commented 4 years ago

Codecov Report

Merging #19 into master will increase coverage by 0.04%. The diff coverage is 87.50%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #19      +/-   ##
==========================================
+ Coverage   73.07%   73.11%   +0.04%     
==========================================
  Files          25       25              
  Lines        1998     2005       +7     
==========================================
+ Hits         1460     1466       +6     
- Misses        538      539       +1     
Impacted Files Coverage Ξ”
urs/utils/Export.py 94.73% <87.50%> (-0.92%) :arrow_down:

Continue to review full report at Codecov.

Legend - Click here to learn more Ξ” = absolute <relative> (impact), ΓΈ = not affected, ? = missing data Powered by Codecov. Last update 6b0a80b...1f7540b. Read the comment docs.

LukeDSchenk commented 4 years ago

🌚πŸ₯΄πŸŒ