Open dotChris90 opened 6 years ago
What do you think fo inheriting from DynamicObject
?
I am not sure. Dynamic object gives much flexibility to a class.
Maybe should not inheritage at moment from anything.
I think in future it could implement stuff like IEnumerable so we get something like index columns pairs.
But for start maybe should not inheritage from anything and see while implementing if we need inheritage.
But if u already found reason for inheritage let me know :)
@Oceania2018 if you do not mind I add a branch for play around with the data frame. Just because the frame class is the most important class in pandas.
@dotChris90 Go ahead. DataFrame is critical. Will see your experiment.
@Oceania2018 haha ah! now got your point why need DynamicObject! I just saw that each dataframe object has some properties which are dynamic. If the columns are A, B, C, D so it will have properties A,B,C,D. Ok - yes totally agree now with you.
But the cons is we won't get the strong type tips once we inherit from DynamicObject.
@Oceania2018 thats true --> at the end I experimented little bit without DynamicObj. Performance counts since this is the strongest benefit of Numsharp and the corresponding projects (static types are better and faster). ;)
You can still reach the columns with df['column1']
Hi, Great idea and promising project! It is very meaningful for the people who is familiar with pandas API, and want to use C# to do data analysis. I have watch this project since I found it, and hope I could contribute some code in the future.
Here's a advice for this issue:
Actually, pandas DataFrame could store different types of columns in a DataFrame. So I think it may be not appropriate to define the TData generic type for the class DataFrame, neither to use the whole NDArray as a internal data container of DataFrame.
There are two libraries which could be reference for you:
@VanyTang thanks for the advice. Is this true? O.o Omg - honestly I had no idea that pandas is so dynamic .... this explains why their performance is so bad. Before I was really hoping that the columns have at least the same data type. Yes actually that is quite critical information.
@Oceania2018 maybe we should at least look Deedle + ML.Dataframe and also pandas source code itself. Honestly spoken I really hate this pandas "my columns can be anything" for performance reason. But at least we should again think about the pros and cons.
@VanyTang by the way - thanks for the kind words. A Numerical Stack is really something that is missing .NET world. Java and Python were the key languages in this area but I think it is time to show we .NET developers are also interested into this. ;)
@dotChris90 Deedle is designed for F# and complained for performance. Let's do DataFrame<TIndex,TData>
, we might add a new type for Y(label) column, think about DataFrame<TIndex, Tx, Ty>
@VanyTang Thanks for you information and welcome to discuss and contribute.
Our goal is mocking python pandas in .NET, transfer python machine learning code into C# in no effort as less as possible.
@VanyTang and one more to mention. :) no matter if sharing codes, ideas, discussions, articles, considerations, links,....
We welcome everybody to share their knowledge. We are dotnet developers, we are open source nerds, we are all just humans and if we really want to make our dotnet framework great in machine learning , our ideas and our wishes come true, so we need every possible suggestion, hint, etc from everybody of you. So please feel always free to post issues and suggestions.
😊
@dotChris90 I think the dynamic column data type is necessary.
Yeah probably.... But I have no idea how we shall handle this in a clean way.
need some more investigation
Sorry I already rise an issue while all is under construction >.<.
We should not let dataframe be a child of ndarray. In Pandas the dataframe is a child of a general pandas object and has no inheritance connection to NDArray. I think we will face same problems if we in heritage from ndarray.