d3an / finviz

Go API for Finviz
https://github.com/d3an/finviz/wiki
MIT License
20 stars 3 forks source link

Question: Should we be using pointers to channels? #90

Closed phllpmcphrsn closed 1 year ago

phllpmcphrsn commented 1 year ago

https://github.com/d3an/finviz/blob/c781b63de5417e58bd6c6f34f57bc978c9139bca/screener/screener.go#L160

I haven't come across this, yet, so it caught my eye. After an initial search, I see that it's not recommended due to the memory size implications. Can you explain why a pointer to a channel is used? (Still learning Go btw)

d3an commented 1 year ago

Just for context, I wasn't really thinking about best practice, more so how to accomplish what I needed to do, which required some out of the box thinking. It's been a while since I've written the code, but here goes.

With regards to Finviz screeners, users can specify an order or ranking with which their results should be presented. If the Finviz query returns $(n \leq 20)$ results, then using goroutines by value instead of reference is the simple answer (that is, if you want to use goroutines at all for such a trivial task ~ 1 GET request). However, a more general query could potentially have thousands of results. Since Finviz only returns 20 results per page, and each page requires a new query, then we have the number of queries/requests set to be $k = \lfloor n / 20 \rfloor$.

However, the major issue here is that all of these pages need to have their order preserved. Unfortunately, there's no guarantee that the goroutines will return in the order they are initiated in. So after some searching, I came across a post similar to this one and followed a similar technique.

To preserve order, I've initiated an array with length $k$ denoting the number of pages in the query. Each element in the array is of type scrapeResult and I've pre-allocated the necessary memory here. Since I already requested the first page (I need to parse the first page to determine how many remaining pages are in the query), the next step is to run the remaining goroutines for the remaining pages, passing in the required url, a pointer to the master wait group, and a pointer to the desired array entry.

Upon completion (either by error or success), the goroutine will notify the wait group of its completion status and will also close the channel into its respective array entry. Once all the goroutines have completed, I then loop through the array, check whether the request errored out or succeeded. If there's a significant error that can't be avoided by changing some request parameters or adding a time delay, then the entire function call could fail. In my experience, so long as users are somewhat conservative with their queries, this won't happen. Upon success of all the goroutines, the results are passed onto processing, which involves DataFrame generation and type cleaning.

EDIT: Giving it a second look, I think I do run the goroutines sequentially, i.e., only one is ever executing at a time. Essentially, I wait for a completion flag from the wait group before launching a new goroutine. This was probably my solution to those memory issues. More importantly, I think it prevented the site from thinking your query is a DDOS attack.

P.S. It is possible to get your IP address hard blocked by their Cloudflare layer.