RichardLitt / low-resource-languages

Resources for conservation, development, and documentation of low resource (human) languages.
Creative Commons Attribution Share Alike 4.0 International
385 stars 56 forks source link

Link issues #100

Closed RichardLitt closed 8 years ago

RichardLitt commented 8 years ago

Ran travis to check links. Ran into these issues:

Issues :-(
> Links 
1. 302 https://github.com/RichardLitt/endangered-languages/edit/master/README.md
2. 404 https://en.wikipedia.org/wiki/Language_preservation)
3. 404 https://en.wikipedia.org/wiki/Open_source)
4. 302 http://wesay.org
5. 301 http://cdec-decoder.org/
6. 301 http://goo.gl/wdnz1W
7. 404 https://wiki.mozilla.org/B2G.
8. 404 http://hunspell.cvs.sourceforge.net/hunspell
9. https://victorio.uit.no/langtech/trunk/ SSL_connect returned=1 errno=0 state=SSLv3 read server certificate B: certificate verify failed
10. 301 http://nltk.github.com/
11. 503 http://www.onlinelinguisticdatabase.org
12. 404 http://dev.panlex.org/tools/
13. 301 http://ilk.uvt.nl/timbl/
14. 302 http://www.wavesurfer.fm
15. https://dative.lingsync.org/ hostname "dative.lingsync.org" does not match the server certificate
16. https://lexicondev.lingsync.org/ SSL_connect returned=1 errno=0 state=SSLv3 read server certificate B: certificate verify failed
17. https://lexicondev.lingsync.org/analysisbytierbyword/inuktitut/nunaqjuaqli%20aaqkiksimalaunngilaq%20sunataqaranilu%20itijuqjuamik%20taaqtualuulluni%20guutiullu%20anirngninga%20ingirralauqpuq%20imaaluup%20qulaagut SSL_connect returned=1 errno=0 state=SSLv3 read server certificate B: certificate verify failed
18. http://geowords.ga/ getaddrinfo: Name or service not known
19. 404 http://www.ark.cs.cmu.edu/TurboParser/nasmith_models/kin-turbo-v1.0.tgz
20. 404 https://github.com/FieldDB/migmaqLessons
21. 404 https://github.com/cidles/mindericobot
22. 404 http://gielese.no
23. http://qaamuus.so/ getaddrinfo: Name or service not known
> Dupes 
  1. https://github.com/sindresorhus/awesome
  2. https://img.shields.io/github/stars/FieldDB/DictionaryChromeExtension.svg
  3. https://github.com/FieldDB/DictionaryChromeExtension
  4. https://img.shields.io/github/stars/sillsdev/wesay.svg
  5. https://github.com/sillsdev/wesay
  6. http://wesay.org
  7. https://img.shields.io/github/stars/clld/clld.svg
  8. https://github.com/clld/clld
  9. http://sourceforge.net/projects/cmusphinx/
  10. https://gerrit.lsdev.sil.org/
  11. https://github.com/clld/glottolog-data
  12. https://github.com/hyphenliu/cnminlangwebcollect
  13. https://github.com/leebock/languages
  14. https://github.com/nltk/nltk
  15. http://nltk.github.com/
  16. https://github.com/LowResourceLanguages/Ojibway-iphone-app
  17. http://www.poio.eu
  18. https://github.com/clld/tsammalex-data
  19. https://github.com/batumi/KartuliChromeExtension
  20. https://github.com/FieldDB
  21. https://wwwdev.lingsync.org/lingllama/lingllama-communitycorpus
  22. https://img.shields.io/github/stars/LowResourceLanguages/hltdi-morphology.svg
  23. https://github.com/LowResourceLanguages/hltdi-morphology
HughP commented 8 years ago

Is there a parse error with issues like "3. 404 https://en.wikipedia.org/wiki/Open_source)"? There is a ")" in the URL which shouldn't be there and the link actually resolves in the document.

RichardLitt commented 8 years ago

@dkhamsing what do you think about that issue?

dkhamsing commented 8 years ago

There was a parsing issue with a previous version .. the current version finds these results

> Links 
1. 302 https://github.com/RichardLitt/endangered-languages/edit/master/README.md
2. 302 http://wesay.org
3. 301 http://cdec-decoder.org/
4. 301 http://goo.gl/wdnz1W
5. 404 http://hunspell.cvs.sourceforge.net/hunspell
6. 301 http://nltk.github.com/
7. 503 http://www.onlinelinguisticdatabase.org
8. 404 http://dev.panlex.org/tools/
9. 301 http://ilk.uvt.nl/timbl/
10. 302 http://www.wavesurfer.fm
11. https://dative.lingsync.org/ hostname "dative.lingsync.org" does not match the server certificate
12. https://lexicondev.lingsync.org/ SSL_connect returned=1 errno=0 state=SSLv3 read server certificate B: certificate verify failed
13. https://lexicondev.lingsync.org/analysisbytierbyword/inuktitut/nunaqjuaqli%20aaqkiksimalaunngilaq%20sunataqaranilu%20itijuqjuamik%20taaqtualuulluni%20guutiullu%20anirngninga%20ingirralauqpuq%20imaaluup%20qulaagut SSL_connect returned=1 errno=0 state=SSLv3 read server certificate B: certificate verify failed
14. 404 http://www.ark.cs.cmu.edu/TurboParser/nasmith_models/kin-turbo-v1.0.tgz
15. 404 https://github.com/FieldDB/migmaqLessons
16. 404 https://github.com/cidles/mindericobot
17. 404 http://gielese.no
> Dupes 
  1. https://github.com/sindresorhus/awesome
  2. https://wiki.mozilla.org/B2G
  3. https://github.com/nltk/nltk
  4. https://github.com/FieldDB
  5. https://img.shields.io/github/stars/LowResourceLanguages/hltdi-morphology.svg
  6. https://github.com/LowResourceLanguages/hltdi-morphology
dkhamsing commented 8 years ago

Actually with your configuration, the current results are

awesome_bot README.md --white-list https://github.com/sindresorhus/awesome,https://github.com/FieldDB,https://img.shields.io/github/stars/LowResourceLanguages/hltdi-morphology.svg,https://github.com/LowResourceLanguages/hltdi-morphology

https://travis-ci.org/RichardLitt/endangered-languages/builds/99912041

> Links 
1. 302 https://github.com/RichardLitt/endangered-languages/edit/master/README.md
2. 302 http://wesay.org
3. 301 http://cdec-decoder.org/
4. 301 http://goo.gl/wdnz1W
5. 404 http://hunspell.cvs.sourceforge.net/hunspell
6. https://victorio.uit.no/langtech/trunk/ SSL_connect returned=1 errno=0 state=SSLv3 read server certificate B: certificate verify failed
7. 301 http://nltk.github.com/
8. 404 http://dev.panlex.org/tools/
9. 503 http://www.onlinelinguisticdatabase.org
10. 301 http://ilk.uvt.nl/timbl/
11. 302 http://www.wavesurfer.fm
12. https://dative.lingsync.org/ hostname "dative.lingsync.org" does not match the server certificate
13. https://lexicondev.lingsync.org/ SSL_connect returned=1 errno=0 state=SSLv3 read server certificate B: certificate verify failed
14. https://lexicondev.lingsync.org/analysisbytierbyword/inuktitut/nunaqjuaqli%20aaqkiksimalaunngilaq%20sunataqaranilu%20itijuqjuamik%20taaqtualuulluni%20guutiullu%20anirngninga%20ingirralauqpuq%20imaaluup%20qulaagut SSL_connect returned=1 errno=0 state=SSLv3 read server certificate B: certificate verify failed
15. 404 http://www.ark.cs.cmu.edu/TurboParser/nasmith_models/kin-turbo-v1.0.tgz
16. 404 https://github.com/cidles/mindericobot
17. 404 http://gielese.no
18. http://qaamuus.so/ getaddrinfo: Name or service not known
> Dupes 
  1. https://wiki.mozilla.org/B2G
  2. https://github.com/nltk/nltk

let me know if you have any questions

RichardLitt commented 8 years ago

@dkhamsing Cool, glad that that bug was fixed.

I'm curious about why [https://wiki.mozilla.org/B2G](https://wiki.mozilla.org/B2G) is counted as a dupe. That seems to me to be a pretty simple example in markdown of where that shouldn't be flagged.

dkhamsing commented 8 years ago

Uh yea the script is not able to distinguish that case .. Can you change it in the readme?

RichardLitt commented 8 years ago

Done. Will attack the other issues later. This shouldn't be marked as a duplicate, though. An easy check would see if there is a surrounding [...](...) around the links themselves.

RichardLitt commented 8 years ago

Current issues:

Issues :-(
> Links 
1. 302 https://github.com/RichardLitt/endangered-languages/edit/master/README.md
2. 302 http://wesay.org
3. 301 http://cdec-decoder.org/
4. 404 http://hunspell.cvs.sourceforge.net/hunspell
5. https://victorio.uit.no/langtech/trunk/ SSL_connect returned=1 errno=0 state=SSLv3 read server certificate B: certificate verify failed
6. 404 http://dev.panlex.org/tools/
7. 301 http://ilk.uvt.nl/timbl/
8. 302 http://www.wavesurfer.fm
9. https://dative.lingsync.org/ hostname "dative.lingsync.org" does not match the server certificate
10. https://lexicondev.lingsync.org/analysisbytierbyword/inuktitut/nunaqjuaqli%20aaqkiksimalaunngilaq%20sunataqaranilu%20itijuqjuamik%20taaqtualuulluni%20guutiullu%20anirngninga%20ingirralauqpuq%20imaaluup%20qulaagut SSL_connect returned=1 errno=0 state=SSLv3 read server certificate B: certificate verify failed
11. https://lexicondev.lingsync.org/ SSL_connect returned=1 errno=0 state=SSLv3 read server certificate B: certificate verify failed
12. 404 http://www.ark.cs.cmu.edu/TurboParser/nasmith_models/kin-turbo-v1.0.tgz
13. 404 https://github.com/cidles/mindericobot
14. http://qaamuus.so/ getaddrinfo: Name or service not known
15. 404 http://gielese.no
> Dupes 
  None ✓
dkhamsing commented 8 years ago

@RichardLitt thanks for the feedback

katamaritaco commented 8 years ago

It appears that most of those links in the most recent (jan 3) combing are legit. Should this issue be closed or have another travis run to check for more broken links?

RichardLitt commented 8 years ago

You're right!