LLNL / scraper

Python library for getting metadata from source code hosting tools
MIT License
49 stars 23 forks source link

Error when attempting to access private repo #30

Open jfredrickson5 opened 5 years ago

jfredrickson5 commented 5 years ago

Attempting to run scraper on a GitHub org with private repos results in an error.

Output:

% scraper --config config.json                     
2019-04-23 17:29:12,536 - INFO: Connected to: https://github.com                                     
2019-04-23 17:29:12,773 - INFO: Processing: GSA/private-test                                         
Traceback (most recent call last):
  File "/home/jf/.pyenv/versions/3.7.0/bin/scraper", line 11, in <module>                            
    load_entry_point('llnl-scraper', 'console_scripts', 'scraper')()                                 
  File "/home/jf/gsa/scraper/scraper/gen_code_gov_json.py", line 76, in main                         
    code_json = code_gov.process_config(config_json)                                                 
  File "/home/jf/gsa/scraper/scraper/code_gov/__init__.py", line 58, in process_config               
    code_gov_project = Project.from_github3(repo, labor_hours=compute_labor_hours)                   
  File "/home/jf/gsa/scraper/scraper/code_gov/models.py", line 217, in from_github3                  
    elif date_parse(repository.created_at) < POLICY_START_DATE:                                      
  File "/home/jf/.pyenv/versions/3.7.0/lib/python3.7/site-packages/dateutil/parser/_parser.py", line 1356, in parse
    return DEFAULTPARSER.parse(timestr, **kwargs)
  File "/home/jf/.pyenv/versions/3.7.0/lib/python3.7/site-packages/dateutil/parser/_parser.py", line 645, in parse
    res, skipped_tokens = self._parse(timestr, **kwargs)                                             
  File "/home/jf/.pyenv/versions/3.7.0/lib/python3.7/site-packages/dateutil/parser/_parser.py", line 721, in _parse
    l = _timelex.split(timestr)         # Splits the timestr into tokens                             
  File "/home/jf/.pyenv/versions/3.7.0/lib/python3.7/site-packages/dateutil/parser/_parser.py", line 207, in split
    return list(cls(s))
  File "/home/jf/.pyenv/versions/3.7.0/lib/python3.7/site-packages/dateutil/parser/_parser.py", line 76, in __init__
    '{itype}'.format(itype=instream.__class__.__name__))                                             
TypeError: Parser must be a string or character stream, not datetime

Here is a simplified config.json as a test case. The GSA/private-test repo is private and contains a README.md file.

{
  "agency": "GSA",
  "contact_email": "github-admins@gsa.gov",
  "GitHub": [
    {
      "public_only": false,
      "repos": [
        "GSA/private-test"
      ]
    }
  ]
}

Example of a real config.json where we encountered the issue. It scans properly until it arrives at a private repo, at which point it crashes.

{
  "agency": "GSA",
  "contact_email": "github-admins@gsa.gov",
  "GitHub": [
    {
      "public_only": false,
      "orgs": [
        "GSA",
        "18F",
        "presidential-innovation-fellows",
        "USWDS"
      ],
    }
  ]
}

Verified that my GitHub access token is valid and can view private repos by using the same token for a different script.

IanLee1521 commented 5 years ago

Interesting... Can you post the output of pip list ? Specifically, I'm looking for version on github3.py

jfredrickson5 commented 5 years ago

Here's pip list:

Package           Version    Location                                                                 
----------------- ---------- --------------------                                                     
asn1crypto        0.24.0                                                                              
certifi           2018.8.24                                                                           
cffi              1.12.3                                                                              
chardet           3.0.4                                                                               
cryptography      2.6.1                                                                               
decorator         4.3.0                                                                               
github3.py        1.2.0                                                                               
idna              2.7                                                                                 
isodate           0.6.0                                                                               
jwcrypto          0.6.0                                                                               
llnl-scraper      0.8.0.dev0 /home/jf/src/scraper                                                     
mock              2.0.0                                                                               
msrest            0.6.6                                                                               
oauthlib          3.0.1                                                                               
pbr               4.2.0                                                                               
pip               19.0.3                                                                              
pycparser         2.19                                                                                
python-dateutil   2.7.3                                                                               
python-gitlab     1.6.0                                                                               
requests          2.19.1                                                                              
requests-oauthlib 1.2.0                                                                               
setuptools        39.0.1                                                                              
six               1.11.0                                                                              
stashy            0.5
uritemplate       3.0.0
uritemplate.py    3.0.2
urllib3           1.23
virtualenv        16.1.0
vsts              0.1.25
jfredrickson5 commented 5 years ago

Huh, now I'm super confused. I nuked my pyenv and started fresh. Now the repository.created_at property is a string and PR #32 no longer works for me.

I had this debugging output when I was working on the change: print("repository.created_at type: ", type(repository.created_at))

It previously output datetime and now it's str.

Here's my latest pip list:

Package           Version    Location
----------------- ---------- ---------------------
asn1crypto        0.24.0
astroid           2.2.5
certifi           2019.3.9
cffi              1.12.3
chardet           3.0.4
cryptography      2.6.1
decorator         4.4.0
github3.py        1.2.0
idna              2.8
isodate           0.6.0
isort             4.3.17
jwcrypto          0.6.0
lazy-object-proxy 1.3.1
llnl-scraper      0.8.0.dev0 /Users/jf/gsa/scraper
mccabe            0.6.1
mock              2.0.0
msrest            0.6.6
oauthlib          3.0.1
pbr               5.2.0
pip               19.1
pycparser         2.19
pylint            2.3.1
python-dateutil   2.8.0
python-gitlab     1.8.0
requests          2.21.0
requests-oauthlib 1.2.0
setuptools        40.8.0
six               1.12.0
stashy            0.6
typed-ast         1.3.5
uritemplate       3.0.0
urllib3           1.24.2
vsts              0.1.25
wrapt             1.11.1

Possibly user error due to a bad environment? No idea. I'm going to see if I can replicate it and if not, maybe we can close this.

IanLee1521 commented 5 years ago

Thanks for the additional information @jfredrickson5 .

FWIW, you're not crazy... I've seen very similar behavior. I think there is a package in the dependency chain that is changing it's behavior... I've thought about trying to add some exception handling there to "do the right thing" but haven't gotten that all the way yet. If you're interested in adding that, I'd welcome the addition!

IanLee1521 commented 5 months ago

@jfredrickson5 - I see you closed the MR, are you thinking that this is resolved too? Or did we still need to fix something?

jfredrickson5 commented 5 months ago

@IanLee1521 I'm not sure what notification GitHub sent you, but I'm in the process of merging my separate personal and work GitHub accounts into one, so I think that must have unintentionally triggered something; I haven't actually made changes to this issue.

IanLee1521 commented 5 months ago

It was the the MR that got closed: https://github.com/LLNL/scraper/pull/32

but ah, I see it was auto-closed by deleting a reference:

image
jfredrickson5 commented 5 months ago

Ah, that was my personal fork that disappeared then. It's been a while so I don't know if the change is still valid, but feel free to grab the change and use it.