Open windrad6 opened 1 year ago
Hi, I see the pandas
and numpy
are also being used in timeseries.py
. Do you have an idea how to change it in a clean way? Otherwise, a local branch or a fork would be the quick and dirty solution
@windrad6 Could you give a bit more info about the Raspberry PI OS and which commands exactly are you using to install filip?
@windrad6 The TimeSeriesApi strongly relies on pandas as it is designed to parse the data to a format that users can directly trough into theier analysis algorithms. Making pandas optional would have two consequences:
The installation of numpy/pandas on raspberry pi is a known issue and it is a pitty that there is no real way to fix that without waiting for hours. Nevertheless, it is probably not the worst idea to reduce dependencies of the library if not required :)
@tstorek Would removing the dependency on pandas become less of a hassle if filip had a separate "lite" installation setup?
Most of the pandas usages are in the examples and tests, which the regular filip user doesn't need.
We would "just" have to figure out how to deal with the usages in Units.
@Maghnie this is incorrect. Examples and tests are independent from the actual package installation. Removing pandas would partly break the timeseries models as already pointed out. Nevertheless, it is probably worth thinking of making unit validation not as strict as it is implemented right now or as an extension package. The current implementation may change with the migration to pydantic v2 (#199) anyway.
The decision on this is up to the core-developer-team. (@djs0109)
@Maghnie I find your idea of creating a "lite" version good. Do you already have an idea of what functionalities are not required? For example, Unit, TimeSeries API, and so on. I am not sure if a "lite" version can be made within the same repo, or should you make a fork? But as for now, I can only promise support in reviewing because our funded projects have not yet encountered such a requirement.
this is incorrect.
What is correct then is the practice of making controversial statements to insure a speedy reply, which is confirmed here. :)
Removing pandas would partly break the timeseries models as already pointed out.
I'm not closely familiar with the code but after skimming it, I'm not sure how removing pandas breaks the TimeSeries models, since its to_pandas
method seems to be used only in tests.
Do you already have an idea of what functionalities are not required? For example, Unit, TimeSeries API, and so on.
Yes, I would start there too. We could later look into other bulky packages, if needed.
I am not sure if a "lite" version can be made within the same repo, or should you make a fork?
I imagine it can be done in the same repo, but just with different installation instructions.
But as for now, I can only promise support in reviewing because our funded projects have not yet encountered such a requirement.
Sure, I understand. Filip works in I-GReta too, it just takes a lot of time (aka money) for the installation.
@Maghnie Let me clarify. to_pandas
is a functionality provided by filip to the user. This would not be possible without pandas. Hence, removing pandas would break this functionality of the model. But as pointed it out this would need some rafactoring to make it optional.
Examples only use it for demonstration reasons and Tests insure that it works as expected. Usually, when using installation via pip the user does not even regognize their existence.
The Unit model is more complex and would require a little more refactoring because it uses pandas to manage all unit-information. Like a tiny database. Furthermore, the attribute models automatically enforce unit validation and auto-completion if some special keywords are used. Especially, on edge-device level this prohibts falsy unit usage. But apparently to the mentioned use-case it comes with some drawbacks that I was not aware of when I implemented that feature.
Nevertheless, personally, I agree with you that making both functionalities optinal is probably a good idea. But it seems easier as you might expect. That is what I tried to say. Nevertheless, splitting the library into several parts will most likely favor its usabilty in different scenarios.
BTW: I suggest to have a look into optional imports and extra dependencies before starting an additonal repo:
try:
import pandas as pd
has_pandas = True
ecxept ImportError():
has_pandas = False
if has_pandas:
do something with pandas....
else:
raise ImportError()
Furthermore, you need to adjust the setup.py and add extras
. Have a look here: https://github.com/RWTH-EBC/ebcpy/blob/master/setup.py
You can find further exmaples online.
After trying to install FiLiP on a Raspberry PI I ran into issues with pandas/numpy. I checked the code and pandas is mainly used in examples, tests and tutorials. For the actual library, it is only used in models/ngsi_v2/units.py for load_units().
Can we make it optional?
It is one of the main drivers for time consumption during setup and a source of incompatibilities on different operating systems.