559eddb / miner-highlearn

Automatically exported from code.google.com/p/miner-highlearn
0 stars 0 forks source link

Some files aren't downloaded - HUJI & TAU #2

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Originally submited by: Amir Luzon <amirluzon@gmail.com>

What steps will reproduce the problem?
1. There are some courses (FIND OUT WHICH ONES EXACTLY)
2. Hit download - As usual
3. The files are missing from the "incoming" folder

What version of the product are you using? On what operating system?
Version 0.7.2 - WinXP or Vista

Original issue reported on code.google.com by Wolf1...@gmail.com on 1 Mar 2009 at 11:12

GoogleCodeExporter commented 9 years ago
Darn it I've missed this report :(

OK so this seems that HUJI isn't working 100%

We nee to verify how to reproduce the problem : Some type of files or specific 
courses;
I can't deal with a problem that I can't reproduce

Original comment by Wolf1...@gmail.com on 29 Apr 2009 at 4:21

GoogleCodeExporter commented 9 years ago
I have files missing, too.
I'm at BGU, not HUJI.

I don't know if it's the same issue, because I get a message at the end saying 
that 
some files were skipped and it also says "(Probably OK)", I'm not sure why.

When I try to DL them manually, it's fine.

The files were all PDFs, but other PDFs from the same course came down fine. 
Even 
from the same directory. I couldn't even find a pattern like "file name is in 
Hebrew" 
that differentiates these files.

I'm sorry I'm not more helpful. If you have any idea that you'd like to 
investigate, 
let me know.

Original comment by NoamNelke on 23 May 2009 at 10:08

GoogleCodeExporter commented 9 years ago
OK, this seems to be confirmed.

Also this problem seems to appear in TAU (during the last couple of months)

I can only deduce that  there were some changes to the Highlearn system (a 
change of
font, or a change of some words in the final HTML that holds the actual HTTP 
links)

This isn't easy to fix, but this is what should be done:
1. find a course with enough files that don't behave
2. use an HTTP sniffer to record the final HTML that the miner-highlearn 
downloads
from the server,
3. compare it to the final HTML that is downloaded manually using 
internet-explorer
4.a. if the HTML files are the same - good , just fix the REGEX that finds the 
actual
link
4.b if the files are different - damn, back-trace to the previous file 
downloaded and
search for diffrences

i'm writing this because i have no intention to be fixing this any time soon, 
i'm
finishing my studies this semester!!!!  :)

if anyone is trying to do this, you're welcome to ask for advice using GTalk or 
ICQ

Cheers

Original comment by Wolf1...@gmail.com on 4 Jun 2009 at 9:40