accord-net / framework

Machine learning, computer vision, statistics and general scientific computing for .NET
http://accord-framework.net
GNU Lesser General Public License v2.1
4.46k stars 2k forks source link

Time Series analysis support #884

Open xieliaing opened 6 years ago

xieliaing commented 6 years ago

time series has a wide range of real world applications and is used literally by all businesses. Currently there is no formal support for time series analysis. Is there any plan for this?

cesarsouza commented 6 years ago

Hi @xieliaing,

Many thanks for opening the issue!

I completely agree with your remarks. But answering your question: Yes, there are plans to add support for time series analysis in the near future. Currently the framework is already able to perform classification of time series through Support Vector Machines, Hidden Markov Model and Hidden Conditional Random Fields classes, but it is true that it may not still be able to perform regression and more general analysis like in ARIMA models.

As such, may I ask exactly which method you would be more interested in seeing the framework support in the future? If possible, please add some references for models you think would be the most useful in the real world so we could prioritize what should be implemented first.

Best regards, Cesar

cesarsouza commented 6 years ago

Also, if you would like to contribute a method yourself, please do not hesitate to send in a pull request!

Best regards, Cesar

CatchemAL commented 6 years ago

Ooh interesting. I'll follow this thread closely. As @cesarsouza says, a little clarity on what time series analysis would particularly interest you would be good to know.

If it's of any interest, I was planning to add EWMA and exp weighted moving variance-covariance calculations to the framework at some point in the next couple of months. If you'd really like it, I can probably add them sooner (or you're very welcome to contribute if you'd prefer as César also mentioned).

xieliaing commented 6 years ago

@cesarsouza @AlexJCross Thanks for the response. Yes, time series analysis is a big topic and we can choose something small and most frequently used as starting point.

I see these as two categories:

  1. Distribution and descriptive analysis. For example, various filters, stationarity test (adf, kpss, and etc), calculation of periodogram, and calculation of ACF, PACF, and etc. I have an example of ACF using Fourier Transformation from Accord.Audio. This class of methods will help analysts like me to conduct formal time series study much more easily.
  2. Time series modeling, from EWMA to X13 to ARIMA or even State space model. A formal implemetation of MLE based ARIMA model will definitely help. currently I use an iterative linear regression process to approximate ARIMA (a method used in 80's due to limitation of computation power).

I would love to contribute but I am not professional programmer, so my code may not be as good as one from a professional one.

CatchemAL commented 6 years ago

I would love to contribute but I am not professional programmer, so my code may not be as good as one from a professional one.

Well most of my code is terrible but that doesn't stop me :-)

Seriously though, if you'd like to add something, please do! Even if it's just the guts of an algorithm, I'm sure it would be very useful to a lot of people. I don't have much knowledge on the categories you listed above (I'm no statistician); I will, however, add EWMA this week.

Thanks, Alex

xieliaing commented 6 years ago

I am not good at class design, but I can follow the design of StatsModel in Python. A good start point is the tsa\stattools.py. I can translate that into C#. Many building blocks are already available in Accord.Audio which I still need to get familiar with first.

cesarsouza commented 6 years ago

Hi @xieliaing,

That could be a good start! In fact, the StatsModel library is under the 3-clause-BSD license, so it should be fine to base implementations on it. It could be possible to start with simple translations of the basic machinery to support hypothesis tests such as Ljung–Box, ADF, KPSS. Most of the methods that will be used in those tests would likely be implemented as static methods in some static class anyway, so the design would not differ very much from the Python file you mentioned.

For the tests themselves, I can wrap them into classes later, following the rest of the style of the framework, adding them to a new TimeSeries namespace under under Accord.Statistics.Testing.

Please don't worry about design at this stage! It would be better to start with working implementations first so we could write enough unit tests to be able to change the design at will later, without risking introducing bugs.

Best regards, Cesar

xieliaing commented 6 years ago

Sounds like a plan. What I can do now is providing C# implementation of these functions first, test them. Later these functions can be incorporated into a well designed TimeSeries class.

xieliaing commented 6 years ago

@cesarsouza Any guidelines on pull requests and check in code? Code style?

cesarsouza commented 6 years ago

@xieliaing Yes, the contributing guidelines are here. For code style, I would say it would be preferable to stick to the original formatting guidelines provided by Visual Studio (i.e. the formatting applied when hitting Ctrl+E, D) since it is probably the most common format out there.

xieliaing commented 6 years ago

@cesarsouza Just to clarify, what I can do is

  1. git checkout -b feature to create a new feature of my own;
  2. git add Files to add in the script
  3. git commit -m "message" to commit the code and wait for code review. I believe git request-pull serves the same goal but allows more elaborative message;
  4. after code review passed, merge with newly pulled master

Is this process right?

cesarsouza commented 6 years ago

Hi @xieliaing,

Almost! For step 4, please submit your pull requests against the development branch of the project instead of master.

Regards, Cesar

cesarsouza commented 6 years ago

Well, actually, to tell the truth, I am not completely sure whether the other commands are completely on spot. But do not worry - please do in a way that is easier to you, and I can take care of merging the code afterwards.

In my own experience, the ideal experience would be if you could:

  1. Clone the repository to your own GitHub account;
  2. Checkout the development branch;
  3. Starting from the development branch, create a new branch for the feature you would like to add (i.e. GH-884);
  4. git add any files you would like;
  5. git commit -m "message";
  6. open a pull request here in GitHub asking for the changes from your "GH-884" branch to be merged with "development";
  7. Wait for code review.

But as I said before, please do it in the way that would be the easiest to you. I can take care of the merges and pull requests afterwards if there is need.

Regards, Cesar

xieliaing commented 6 years ago

@cesarsouza Thanks for the detailed explanation. I will follow the above steps. Thanks for reminding me the development branch

CatchemAL commented 6 years ago

Hi @xieliaing,

You've probably seen me doing it on this thread but if you include the text "#884" or "GH-884" on any of your git commit messages, it will automatically get linked to this issue. You don't have to do that(!) but it definitely makes it clear in two year's time why a commit was made if it can be tied to an issue.

On the pull request thing, César said it best; do whatever is easiest for you in the first instance. Pull requests are not too bad once you get the hang of them. Octocat have a really nice tutorial and a repo (called Spoon-knife) you can practice on but any questions, feel free to ask. https://help.github.com/articles/fork-a-repo/

Best, Alex

xieliaing commented 6 years ago

@AlexJCross You act fast, super! How to do CR on GitHub?

CatchemAL commented 6 years ago

Hey @xieliaing,

My git terminology is not all that good I'm afraid. Is CR code review? I know PR as pull request.

Anyways, if it's one of those, once you create a branch in your forked repositories (e.g. xieliaing-Samples-TimeSeries) and push some code to GitHub, you should see a button on the page for pull request. Choose your branch to merge from and then Accord's development branch to merge to.

In terms of code review, GitHub will provide a breakdown of the files changed/added between the two branches once you send a PR so your work can easily be reviewed in that way.

In terms of reviewing, I am happy to have a cursory look over but my knowledge of stats is not all that good so I might need to defer to César for this.

Best, Alex

xieliaing commented 6 years ago

@AlexJCross reviewed the tutorial you mentioned, very helpful. Basically, here is what I need to do:

  1. Fork framework to my own github account;
  2. Make my fork sync with Accord-framework by setting it to be upstream;
  3. Sync with Accord-framework to get the most recent version;
  4. In my fork, create a new Branch
  5. Add in any TimeSeries related files to the new Branch
  6. Commit with message "GH-884 ......"
  7. Create a new Pull Request from fork to Accord-framework development branch
  8. Wait for review.
cesarsouza commented 6 years ago

Thanks @xieliaing, I've added a review to the pull request you had created!

xieliaing commented 6 years ago

@cesarsouza can you merge this thread to GH-884?