eliangcs / pystock-crawler

(UNMAINTAINED) Crawl and parse financial reports (XBRL) from SEC EDGAR, and daily stock prices from Yahoo Finance
MIT License
311 stars 105 forks source link

get reports return empty #22

Closed bernard1 closed 8 years ago

bernard1 commented 8 years ago

command is pystock-crawler reports WBAI -o out.csv WBAI symbol is one of lists symbol files.

Below is detail info. Cheers 2016-08-30 09:40:08+0100 [scrapy] INFO: Command: scrapy crawl edgar -a symbols="WBAI" -t csv -a limit=0,500 -o "/Users/XXXX/out.csv.1" 2016-08-30 09:40:08+0100 [scrapy] INFO: Creating temporary config: /Users/XXXX/scrapy.cfg 2016-08-30 09:40:08+0100 [scrapy] INFO: Scrapy 0.24.4 started (bot: pystock-crawler) 2016-08-30 09:40:08+0100 [scrapy] INFO: Optional features available: ssl, http11 2016-08-30 09:40:08+0100 [scrapy] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'pystock_crawler.spiders', 'FEED_URI': '/Users/XXXX/out.csv.1', 'LOG_LEVEL': 'INFO', 'SPIDER_MODULES': ['pystock_crawler.spiders'], 'HTTPCACHE_ENABLED': True, 'RETRY_TIMES': 4, 'BOT_NAME': 'pystock-crawler', 'COOKIES_ENABLED': False, 'FEED_FORMAT': 'csv', 'HTTPCACHE_POLICY': 'scrapy.contrib.httpcache.RFC2616Policy', 'HTTPCACHE_STORAGE': 'scrapy.contrib.httpcache.LeveldbCacheStorage'} 2016-08-30 09:40:08+0100 [scrapy] INFO: Enabled extensions: FeedExporter, LogStats, TelnetConsole, CloseSpider, WebService, CoreStats, PassiveThrottle, SpiderState 2016-08-30 09:40:08+0100 [scrapy] INFO: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, ChunkedTransferMiddleware, DownloaderStats, HttpCacheMiddleware 2016-08-30 09:40:08+0100 [scrapy] INFO: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware 2016-08-30 09:40:08+0100 [scrapy] INFO: Enabled item pipelines: 2016-08-30 09:40:08+0100 [edgar] INFO: Spider opened 2016-08-30 09:40:08+0100 [edgar] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2016-08-30 09:40:10+0100 [edgar] INFO: Closing spider (finished) 2016-08-30 09:40:10+0100 [edgar] INFO: Dumping Scrapy stats: {'delay_count': 0, 'downloader/request_bytes': 608, 'downloader/request_count': 2, 'downloader/request_method_count/GET': 2, 'downloader/response_bytes': 3253, 'downloader/response_count': 2, 'downloader/response_status_count/200': 1, 'downloader/response_status_count/301': 1, 'finish_reason': 'finished', 'finish_time': datetime.datetime(2016, 8, 30, 8, 40, 10, 300817), 'httpcache/firsthand': 1, 'httpcache/hit': 1, 'httpcache/miss': 1, 'httpcache/uncacheable': 1, 'log_count/INFO': 7, 'request_depth_count/0': 1, 'response_received_count': 1, 'scheduler/dequeued': 2, 'scheduler/dequeued/memory': 2, 'scheduler/enqueued': 2, 'scheduler/enqueued/memory': 2, 'start_time': datetime.datetime(2016, 8, 30, 8, 40, 8, 704882)} 2016-08-30 09:40:10+0100 [edgar] INFO: Spider closed (finished) 2016-08-30 09:40:10+0100 [scrapy] INFO: Deleting /Users/XXXX/scrapy.cfg 2016-08-30 09:40:10+0100 [scrapy] INFO: Merging files to /Users/XXXX/out.csv 2016-08-30 09:40:10+0100 [scrapy] INFO: Deleting /Users/XXXX/out.csv.1

eliangcs commented 8 years ago

There are no 10-K or 10-Q filings for WBAI: https://www.sec.gov/cgi-bin/browse-edgar?CIK=WBAI&Find=Search&owner=exclude&action=getcompany&type=10-

bernard1 commented 8 years ago

Thanks, I saw it as well, but very stranger, this company should fill 10-Q or 10-K ,but where put to? this company based on China, I think all China's company is same issue.

bernard1 commented 8 years ago

foreign company not have to fill 10-q or 10-k, usually they put quarter report under 6-k, could you please make a option on the command to grab. or I check code can change it or not. Anyway ,Thank you.

eliangcs commented 8 years ago

I don't plan to support other reports besides 10-K and 10-Q. And I doubt the current parser is able to work on 6-K reports, but you can try it by modifying the URL at https://github.com/eliangcs/pystock-crawler/blob/master/pystock_crawler/spiders/edgar.py#L19 so that it doesn't filter filing types that don't begin with "10-".