PhDEcon108 / HWs-Ana-Gomes

0 stars 1 forks source link

Midterm Project - theme #1

Open AnaRitaGomes-3 opened 2 years ago

AnaRitaGomes-3 commented 2 years ago

Good afternoon professor @niskrev !

I'd like to use the scikit-hts package and apply it to an example similar to the Australia one. I would like to know if applying this example to Portugal by subdividing it into NUTs I, II and III would be suitable for this Midterm project. The data would be from Pordata (document "Pordata" from this folder).

Thank you!

Ana Rita Gomes

niskrev commented 2 years ago

Hi @AnaRitaGomes-3 I haven't used this package, but think it is interesting and worth exploring. It is also fine if you use it for both the midterm and final projects. For example, if there is a published paper using the Australian data, you can try to replicate it with Portuguese data as your final project, and use the midterm project just to familiarize yourself with the scikit-hts package, using a more basic example. Or do some part of the work in the midterm project and then expand on it in the final.

AnaRitaGomes-3 commented 2 years ago

Thank you for the answer professor @niskrev! I put a "draft" of the project in the "Midterm_Project" folder. Basically, what was done was to create a hierarchy between the total of Portugal, NUTS II and III and make the respective charts of overnight stays in tourist accommodations, from 2010 to 2020. In this sense, I would like to know if it corresponds to the intended for this project and , therefore, I will improve this draft. For the final project, my idea was to extend this project and forecast as in the article "Wickramasuriya, SL, Athanasopoulos, G., & Hyndman, RJ (2019). Optimal forecast reconciliation for hierarchical and grouped time series through trace minimization. Journal of the American Statistical Association, 114(526), ​​804–819. [DOI]", but with Portuguese data. However, I still don't quite understand if I can do it, because I think there is part of the code in a github. Thanks a lot for all the help!

niskrev commented 2 years ago

Hi @AnaRitaGomes-3 . The main objective of this project is to familiarize yourself with how to use the package, and show something interesting that you can do with it, that is not possible or is not as easy to do with, for example, statsmodels, or other packages we use (pandas, matplotlib, arch). It is enough to take one or more examples from the documentation of scikit-hts and replicate them with a different dataset, which you find suitable for that purpose. From what I know about scikit-ht, its main objective is to produce aggregate forecasts which are in agreement with the disaggregated ones. If this is something you are interested in pursuing further in your research, I think you should investigate how to do it, and show with some example. It doesn't have to be the full disaggregated tourism data for Portugal - maybe it's possible to define larger groups of regions - e.g. Algarve and the rest of Portugal, etc. It doesn't have to make sense as a research question (at this point) - the goal is to demonstrate the usefulness of the software.

The second talk here is a demo of scikit-ht. And here is a different python package for hierarchical forecasting that may be interesting.

AnaRitaGomes-3 commented 2 years ago

I'm sorry professor @niskrev, but I think I didn't understand well. Can I use the draft I sent for the midterm project? what I sent was what was done in the example of Australia found in the package documentation, but applied to Portugal.

niskrev commented 2 years ago

Hi @AnaRitaGomes-3 it is up to you. I didn't look carefully at what you have done there. Did you do any forecasting/forecast reconciliation? Because that's the main reason to use this package. If you want to do that with Portuguese data like this article does for Australia - go for it. But if doing this is too difficult now, you should consider a simpler example instead - there are plenty of examples in the documentation here. One of them shows how to reconcile pre-computed forecasts - you can make forecasts like in our homeworks, and use the package to reconcile them, as the examples show.

Sorry, if I am not too clear. If you prefer, let's have a short chat on zoom and discuss.

AnaRitaGomes-3 commented 2 years ago

ok, thanks a lot for the help @niskrev ! In the midterm I will make some forecast examples like the ones you sent and in the final project I will apply part of the analysis of the article from Australia with data from Portugal.

Right now, I'm having trouble using the prophet/fbprophet package. I've seen that it's a very recurring problem, and from what I've noticed, it only works with older versions of python. However, I've already tried installing version 3.7 and installing the dependencies first and then prophet, but even so, it always gives the same error: "Failed building wheel for fbprophet". I would need this package both for the mid term and for the final project, otherwise I will only use arima for the forecasts. Thank you so much for all the help and sorry for the inconvenience.

niskrev commented 2 years ago

Hi @AnaRitaGomes-3 it's fine to just use ARIMA (or another forecasting model from statsmodels like holtwinters which is shown in one of the examples of scikit-hts). The main use of that package is reconciling forecasts, and once you learn how to do that, you can easily adapt to different forecasting models. I tried installing prophet and was successful following these steps. Try it if you want (only proceed to the next step if the previous one is successful) but you dont have to.

conda create -n prophet_env python=3.8 # use some other name if you want
conda activate prophet_env
conda install libpython m2w64-toolchain -c msys2
conda install numpy cython -c conda-forge
conda install pystan -c conda-forge
conda install -c conda-forge prophet
AnaRitaGomes-3 commented 2 years ago

Thank you @niskrev ! That way I can install prophet, but I was trying to install it inside the environment.yml. Because, in this way, I have to carry out the work inside the prophet env and it doesn't have an associated file '.yml' to later send it to you, right? or is there any way to associate a .yml file? I'm not making it. Thank you so much for all the help and sorry for the inconvenience again.

niskrev commented 2 years ago

No problem @AnaRitaGomes-3! You can create .yml file using

conda env export --from-history > environment.yml

first, activate the environment where you installed prophet and the other packages, and change the directory to the main directory of your project - executing the command will create the environment.yml file.

AnaRitaGomes-3 commented 2 years ago

Thank you very much, professor @niskrev. My forecasts are sometimes negative which makes no sense. I've googled it and thought about doing Box-Cox Transform, but it would have to be a 1-dimensional array and so it's not possible. I also tried to use the transform function, to do the log and then the exp, but I also get errors. I put the files here in the Midterm folder, in case it is useful. Can you help me with this issue?

Thank you very much and I apologize for the inconvenience!

niskrev commented 2 years ago

Hi @AnaRitaGomes-3 It may be helpful to try to understand why there are negative forecasts. I would check if the estimated model has a good in-sample fit, and produces reasonable in-sample forecasts. Also, maybe try some ARIMA model specifications using statsmodels and see if you also get negative forecasts. I would be careful with the transformations (unless they are part of the scikit-hts package) because forecasts for the transformed disaggregate series doesn't have to sum up to the transformed aggregate series, so it may be difficult to know how to reconcile the forecasts of the transformed series.

But my first recommendation for you is to start with a much simpler case - use 2 or 3 levels of aggregation instead of the full hierarchy you have in the data. The aggregated numbers for say Lisbon, Algarve, Alentejo, etc are much larger than when you look at smaller places, and it should be easier to obtain reasonable (positive!) forecasts. At least it would be easier to investigate and find out what is going wrong when you have only a few series. If I were you I would first just consider 2 disaggregated series (for example Lisbon, and the rest of PT) fit some ARIMA models for each, as well as for PT as a whole, and see how to reconcile those forecasts using scikit-hts. Doing something like this is perfectly fine for this project.