Gunratan / edgar

Tool for the U.S. SEC EDGAR Retrieval and Parsing of Corporate Filings
29 stars 14 forks source link

Issue in downloading Master Indexes #4

Open tdp-datsci opened 2 years ago

tdp-datsci commented 2 years ago

I am having an issue with getting any of the Master Indexes to download. All of the files in the folder that is created are approximately 5KB in size. I am using my email address as the useragent and have used ' and " with the same result.

Edgar version 2.04 All other packages are current

Sample script:

test <- getBusinDescr(320193, 2014, useragent)

donboyd5 commented 2 years ago

I have run into the same issue. I have a feeling that the SEC may have changed access rules and this package and at least one other package, edgarWebR, may not have had a chance to keep up.

I say that because if I run the following example from this package:

output <- getDailyMaster('08/09/2016', useragent)
head(output)

It creates a file with the expected name (daily_idx_20160809), but if you open it with a text editor, it is not an index file but an html file. After stripping out the html tags, the relevant part is:

Your Request Originates from an Undeclared Automated Tool. To allow for equitable access to all users, SEC reserves the right to limit requests originating from undeclared automated tools. Your request has been identified as part of a network of automated tools outside of the acceptable policy and will be managed until action is taken to declare your traffic.Please declare your traffic by updating your user agent to include company specific information. For best practices on efficiently downloading information from SEC.gov, including the latest EDGAR filings, visit https://www.sec.gov/developer. You can also sign up for email updates on the SEC open data program, including best practices that make it more efficient to download data, and SEC.gov enhancements that may impact scripted downloading processes. For more information, contact opendata@sec.gov

I have double and triple checked that I am using the right information in useragent but nothing I do gets the download to work. And in edgarWebR I have double and triple checked that I am setting its environment variable (EDGARWEBR_USER_AGENT) properly, but nothing works there, either.

So I am guessing the SEC has changed either what is required for access, or placed limits on access, but I am a newcomer to all of this and haven't figured out the correct solution.

It would be great if someone knowledgeable about this could comment.

t-emery commented 2 years ago

I'm getting the same behavior described by donboyd5 above. I too have double-checked that I have properly formatted the information in the useragent string. I get the same message posted above both for getDailyMaster()and getMasterIndex() using the examples (and my own useragent string) provided in the documentation.

I agree that my first suspicion is that perhaps there has been a change in the API.

using: edgar_2.0.4 useragent = "First Last address@domain.com"

TylerPantuso commented 2 years ago

Definitely an issue that the SEC needs to fix on their end. It returns an HTML page headed, "Your Request Originates from an Undeclared Automated Tool," even when the User-Agent is declared exactly like the SEC requires.

tdp-datsci commented 2 years ago

Just a quick update regarding this issue. There is an update to the Edgar package (2.05) which was posted in February 2022 in CRAN. I am not certain why Edgar 2.04 is still listed under Mr. Guratan's page rather than the update, but after updating it does appear to work as expected. I have not done extensive testing, but my initial tests seem promising.