Closed tuxdna closed 6 years ago
With the above data in file golang-400-job.txt
, and using following script find_failing.py
:
import re
import itertools
data = []
for line in open("golang-400-job.txt").readlines():
s = line.strip()
m = re.match(".*Couldn't get latest commit for (\S+).*", s)
if m:
data.append(m.groups()[0])
data = sorted(data)
grouped_repos = [(k, list(map(lambda x: x[1:], v)))
for k, v in itertools.groupby(map(lambda x: x.split("/"), data), lambda x: x[0])]
grouped_repos = sorted(grouped_repos, key=lambda x: -len(x[1]))
for k, v in grouped_repos:
print(k, ": No of packages", len(v))
$ python find_failing.py
github.com : No of packages 143
golang.org : No of packages 35
gopkg.in : No of packages 17
google.golang.org : No of packages 13
launchpad.net : No of packages 3
bazil.org : No of packages 2
k8s.io : No of packages 2
labix.org : No of packages 2
bitbucket.org : No of packages 1
git.apache.org : No of packages 1
git.eclipse.org : No of packages 1
Failure happens due to https://github.com/fabric8-analytics/fabric8-analytics-jobs/blob/37fa4aaeb0d7a0ce56ee2f337c5731e48a9de741/f8a_jobs/handlers/golang_popular_analyses.py#L14
In this case for example github.com/garyburd/redigo/redis
will result into
>>> url = 'https://{p}/commits/master'.format(p='github.com/garyburd/redigo/redis')
https://github.com/garyburd/redigo/redis/commits/master
The above url returns 404 error.
We should instead pick the first three components of package name to for the url github.com/garyburd/redigo/
In [33]: package='github.com/garyburd/redigo'
In [34]: %paste
def _get_latest_commit(package):
if package.startswith('github.com'):
url = 'https://{p}/commits/master'.format(p=package)
response = requests.get(url)
if response.status_code == 200:
page = BeautifulSoup(response.text, 'html.parser')
commit_links = page.find_all(class_='commit-links-group BtnGroup')
if commit_links:
commit_tag = commit_links[0].find_next('a')
if commit_tag:
link = commit_tag.get('href', '')
if link and '/' in link:
return link.split('/')[-1]
return None
## -- End pasted text --
In [35]: _get_latest_commit(package)
Out[35]: '47dc60e71eed504e3ef8e77ee3c6fe720f3be57f'
Any golang package contain more than three components fails because latest commit information for that package couldn't be found. Below is a list of packages ( from top 400 popular golang packages ), which fail to schedule: