Open Huite opened 8 months ago
you could maintain an internal list of rows "to add". This obviously complicates the internals way too much, because you never know whether your input is actually a dataframe proper, or a dataframe-in-the-making.
Yeah, I suggested something similar to @visr the other day. We for sure need to benchmark a lot before looking into this. I don't think it would complicate internals too much, but we would lose the ability to set dataframes directly.
Add methods to add many elements at once. Create one dataframe with many rows, and call concat once. It provides the same convenience, except you provide it list[NodeData] for example. (Add a type check for the single versus multiple methods.)
Sounds good to me. We should make clear that the main reason to use that API is performance.
Maybe somewhat of a random thought. The add-api looks very good to me, and will help enormously in setting up models. One downside is that it calls
pd.concat()
once per added element. I'm pretty sure that concat allocates an entirely new dataframe. My mental model for dataframes is that it's essentially adict[str, numpy.array]
-- this isn't quite accurate because pandas has a block manager and whatnot, but it's close enough.Anyway, a numpy array cannot be resized, and it will be reallocated and so it means adding many elements will have quadratic time complexity rather than linear. If you want to add 10_000 nodes at once, this is obviously wasteful.
Bad solutions:
Good solution:
list[NodeData]
for example. (Add a type check for the single versus multiple methods.)shapely.linestrings()
constructor.