Pirate-Weather / pirateweather

Code and documentation for the Pirate Weather API
Apache License 2.0
652 stars 29 forks source link

Reasons I cannot use Pirate Weather in my apps #50

Closed oisact closed 1 year ago

oisact commented 1 year ago

First of all thanks for providing an alternative weather API. Unfortunately, after spending $25 with the intention of evaluating this in my apps for a month, I had to switch away to another service (using WeatherKit) after just a few days. The following are all reasons why I had to discontinue using your API, and I hope this may illustrate items that you might want to address in case other developers have needs similar to mine.

  1. Inaccurate high temperatures in the evening (I believe after 7 PM). The forecasts show a seemingly random (IE not the high for the current day or the next day) high temperature during the evening time for the current (zero index) day. This temperature appears to represent the highest temperature for the rest of the daily forecast range, as it is greater than the high for the current or next day. For example, today's high was only 62. Tomorrow's high is 68, the next day is 69. However the first entry in the daily->data array shows a high of 72.57. This appears to be the high temperature 3 days from now, and is grossly incorrect for the current day forecast.

  2. API latency. The API response time I see, minimum, is over 1000 ms. This is far too slow for an API of this kind, and is probably the result of the on-demand AWS lamdba functions having to process into a massive dataset for each request. Sure this is convenient on your end, for scalability, ease of implementation and only paying AWS for the minimum processing you need, but it is at a performance cost for all end users. If you really want to provide nationwide coverage, then pre-generating static json in 3km grids would result in vastly faster response times, and if your service grows in use, will actually require less processing on average as well. Regardless, this latency is directly passed down to my app users (although I cache county-wide data in 15 minute intervals to save on API calls, so at least some of the requests are cache hits), and it is far too slow.

  3. Cost. I already pay $100 a year for the Apple Developer Program, so basically I get 500k REST weather requests a month free. However even if I didn't, it would still only cost $8.33 a month for 500k requests purely to access weather from Apple. Pirate Weather costs $25 a month for 300k requests, which is 3 times as much for less requests. Further it isn't clear what happens when I exceed 300k requests, or how one goes about paying for a plan to get more than 300k requests per month.

  4. There is no way (that I could find) to track my API request usage. Sure, I can try and count this on my end, but whatever the service itself is counting is what really matters, as that is the limit that, when exceeded, could immediately cause my app's weather to cease functioning for the remainder of the billing period.

  5. The Minutely, Hourly and Daily "summary" is no longer a summary as it was in DarkSky, but is just a precipitation condition code, like "Clear" or "Rain". The "Clear" value is apparently used to represent any condition that is not precipitating. Thus overcast, foggy, cloudy, etc all has a summary of "Clear" which is highly inaccurate. Regardless, the summary fields were, for my use, the highest value piece of information in the entire forecast dataset (with the possible exception of the low and high temp). This is not generated by your API, so this cannot serve as a drop-in DarkSky replacement for me, which is how the service is advertised.

  6. The code used by the API is not open source, as the true core of the implementation, which is whatever is generating the JSON output, has not been released. I believe this is the code that is in error or deficient as discussed above, and the lack thereof also prevents me from rolling my own internal processing of NWS data to pre-calculate the data I need instead of doing so only on demand. So I am unable to fix, enhance or increase the performance of this service. It just seems to advertise as an open source service when it isn't.

Gr3q commented 1 year ago

I have the same concerns in the inverse priority order.

alexander0042 commented 1 year ago

Hi,

Thank you so much for taking the time to write out this list- your points here are clear, well laid out, and well taken on my end. I came to this project from a data processing standpoint, which means I'm very new to thinking about a lot of the concerns you're raising.

While I can't promise I'll have terribly satisfying answers, I'll do my best to address these in turn:

  1. This is a long-standing issue, and relates to a more systemic problem of how I handle recent historic data. I set everything up going forward from an initial forecast, so anything outside of this paradigm creates a ton of issues. As it stands now, it's not random, but not as good as it should be. I have a solid plan for how to address this, but it involves changing my processing pipeline, so not as easy of a fix as I would like. If you're interested in the full discussion, it's all in issue #5.

  2. Yup, your guess here is spot on- the time is what's required for Lambda to go and read the data/ process everything. It works well enough at the moment for a lot of non-time sensitive applications, but I completely understand why it's too slow. The only bit of good news here is that I've improved this a lot over the past year (going from ~1500 ms to ~800ns), and I have some longer term ideas to get this below ~500ms, but I don't think it'll ever be faster than that.

  3. This isn't something I'd heard before, so appreciate hearing this! I picked the prices to be about 75% of Dark Sky, but I should take a look at what Apple Weather and other providers are up to. My main concern is being able to cover the costs of the free tier, so once I get a better idea of what the ratio of paid/ unpaid subscribers is, I'll hopefully be able to bring this down.

  4. This is a big issue, and has become the top priority on my roadmap! I'm getting close to the limits imposed by AWS on their API gateway, so moving to Kong, which will return the number of requests as a header. At the current rate of signup, I'll hit the AWS limit in the next month, so I'll have to get this working before then!

  5. Another big issue, and one I was working on before I realized how close I was to the API limits. This comes down to a giant series of IF statements, so not a particularly complex problem, but one that takes a while to write up.

  6. Great point, and compared to the other issues, technically easy to fix at least! I really want this to be open and transparent, hence the variable documentation, but I also have an existential fear of Amazon or someone coming along and just replicating this service on their own. My gut says that I shouldn't worry too much about that, and I'd be fine to just publish the script (it's only 2000 lines of Python, so not exactly groundbreaking), but it's something I'm still mulling over. In the meantime, I'm always happy to explain how anything is calculated, and you should always be able to reference the original model data.

To be perfectly honest, I've been blown away by the interest in this, and so I really apologize that it's pretty rough around the edges. In the last month, the level of traffic has grown 10x, which is incredibly exciting and motivating, but also raises the expectations a lot. I'm hopeful that in a few months this will have improved to the point I can address some of these concerns, and again, appreciate the time you took to write everything out.

oisact commented 1 year ago

Thanks for your extremely detailed response! Some further comments on my end.

  1. I think the ideal solution, once your usage increases to a certain point, is to precalculate the entire country (or at the least precalculate any 3km cells that have been requested recently) into static json that can be served with practically zero overhead (or even better, purely by a CDN). As an example, say in a very densely populated area you receive 500 requests per time bucket (I think I read you process in 15 minute intervals), so you are repeating the same processing 500 times to output the same data. It would be far more economical to use a standard EC2 compute instance to be doing that work every 15 minutes than on-demand Lambda per request. The performance of your service, to the end user, would be incredibly fast.

  2. WeatherKit is the official replacement for DarkSky, but it is neither a drop-in replacement (it is extremely similar but many field names have changed), nor does it contain any text forecasts at all (same exact issue as item 5). So although it is "free" for existing app developers like myself, it is still not suiting my needs. In regards to cost, my normal monthly DarkSky requests were in the low 100,000s. Looking back at my invoices, my monthly costs were around $10-$12. With your service pricing tiers my only option was to use your most expensive "all the data" tier at $25. So although you priced your service at 75% of that of DarkSky for 300k requests, I am paying over 200% more than DarkSky for my 100k requests.

  3. As I mentioned in item 2, WeatherKit is not providing this either. When I chose DarkSky several years ago it was because of the extremely good text forecasts their API provided over anything else I evaluated. I think I was using Weather Underground API up until that point (and they stopped providing their service or began charging a ridiculous amount to use it - I can't remember which), and I was extremely impressed with DarkSky's text forecasts. They were both extremely detailed, and very concise (IE compact and not wordy). I think there is a big space here for you to fill that gap in text based forecasts. In my apps, the display of the day's weather forecast and current conditions is secondary to the primary information displayed in my app (car accidents, power outages, weather alerts, school closings, earthquakes...), and I do not need a robust data set, but just a simple concise text based forecast. At this point my plan is to implement my own routines to generate that from the lower-level data, but hopefully you'll beat me to it.

  4. For me personally, I have flirted with the idea of getting the data I need directly from NWS for a long time now, and I have implemented back end stuff to consume from their "new" higher-level API (https://weather-gov.github.io/api/general-faqs) when it went live a few years ago, but their new API was never stable enough for me to actually use it. Plus it is missing some very basic pieces of information that appeared to be pure oversight on their end. So I made the decision to consume weather data from 3rd parties (DarkSky, Pirate Weather, etc) until I could make the "big" leap and consume the GRIBs directly. However on my end I would be processing the GRIBs on a fixed schedule so the data my apps need are sitting there as static JSON ready for them to consume. So again, for me personally, your implementation would not suite my needs (on-demand Lambda based), and I'm also very loathe to commit to any provider's proprietary services (I have steered the companies I work for away from Lambda on multiple occasions, and this has saved a tremendous amount of money), and in general I prefer a LAMP stack for the massive amount of flexibility it provides on the hosting side. However I'm always happy to look at code for purposes of understanding data formats or what hoops I need jump through. Still as a fellow developer I totally understand your stance and that is a totally legitimate concern.

Gr3q commented 1 year ago
  1. You could release it with GPL license or something similar. I understand your concerns, but if it's not released it prevents people (including me) to contribute to that part of the project and pushes all the development burden on you.
tannewt commented 1 year ago

I'd love to contribute to the frontend code too. That'd enable non-json response formats that could be more memory efficient.

dzungpv commented 1 year ago

@oisact I have a weather app too and I have similar concern with you. So this is some reason I will not using Pirate Weather in my apps: 1, The author want to startup a weather service with open data, he will not open source all the code. He only share some code of it it gain trust from retire Dark Sky user. 2, The quality of data never near the old Dark Sky or the new in the hand of Apple with many commercial source of data, I have a developer account there and using it. 3, Many opensource project, they open all the code, but still get the value back in money, like Home Assistance, Ubuntu Linux ... there many way to get the money back, not by close its source code. Or first they open it all until the product reach stable level.