VijayQin / DMHY-spider

This project aims at crawling on DMHY, and store the html and torrent of each animation in our local file systems and in database(SQLite3). The future work will be filtering animation we want by the given rules and alert us to those updated every day.
14 stars 2 forks source link

When torrent is delete on website, we will get HTTPError, errno 404 #6

Closed VijayQin closed 8 years ago

VijayQin commented 8 years ago

When parsing https://share.dmhy.org/topics/view/438761_Qualidea_Code_04_MP4_720p.html#description-end we will get urllib2.HTTPError: HTTP Error 404: Not Found because the torrent inside was delete. image

VijayQin commented 8 years ago

this issue is fix by adding try……exception when fetching torrent. try : u = urllib2.urlopen(torrent_url) torrent = u.read() u.close() fileName = self.formulate_title(torrent_url.split('/')[-1]) file_path = self.prune_title(path, fileName)

with open(file_path, 'wb') as f :

                #     f.write(torrent)
                with DMHY_Write_file_exception(file_path, 'wb', url) as f :
                    f.write(torrent)
            except urllib2.HTTPError, e:
                if 404 == e.code :
                    print "HTTPError error({0}): {1}".format(e.code, e.reason)
                else :
                    print e.code
                    print e.reason
                    raise
            except Exception as e:
                print e.code
                print e.reason
                raise