apache / incubator-hugegraph

A graph database that supports more than 100+ billion data, high performance and scalability (Include OLTP Engine & REST-API & Backends)
https://hugegraph.apache.org
Apache License 2.0
2.62k stars 517 forks source link

[Bug] hugeclient批量数据写入,提示:java.lang.illegalArgumentException: the max length of bytes is 65535, but get 337492. #2291

Open dongma opened 1 year ago

dongma commented 1 year ago

Bug Type (问题类型)

rest-api (结果不合预期)

Before submit

Environment (环境信息)

图库版本:1.0.0版本 存储引擎:HBase 数据量:点1亿+,边数量1亿+。点已正常写入,边上有属性字段较多,一批次写入1000条左右。

期望,请求体大小大于95535,看图库服务端是否参数可调整(限制请求体大小的参数为?)

Expected & Actual behavior (期望与实际表现)

hugeclient批量数据写入,一批次request body请求次数大于服务端默认的65535大小,导致关系数据无法写入。 异常堆栈如下:

java.lang.IllegalArgumentException: the max length of bytes is 65535,  but get 337492.
at org.apache.hugegraph.exception.ServerException.fromResponse(ServerException.java: 45)
at org.apache.hugegraph.client.RestClient.checkStatus(RestClinet.java:91)
at org.apache.hugegraph.rest.AbstractRestClient.post(AbstractRestClient.java:232)
at org.apache.hugegraph.api.graph.EdgeAPI.create(EdgeAPI.java:58)
at org.apache.hugegraph.driver.GraphManager.addEdges(GraphManager.java:262)
at org.apache.hugegraph.driver.GraphManager.addEdges(GraphManager.java:254)
.....

Vertex/Edge example (问题点 / 边数据举例)

No response

Schema [VertexLabel, EdgeLabel, IndexLabel] (元数据结构)

No response

imbajin commented 1 year ago

@DanGuge could take a look for it? Better to make the vertex/edge ID length configurable

DanGuge commented 1 year ago

@DanGuge could take a look for it? Better to make the vertex/edge ID length configurable

I will check this later

dongma commented 1 year ago

@DanGuge could take a look for it? Better to make the vertex/edge ID length configurable

Thanks for your reply, I had resolved this problem last night [憨笑] . The reason why failed to write record to hugegraph server is the length of Text(String) type was limit to 65535.

After check my data rows, found that one property value' length of few rows is over that 65535, the writing operation was success after limit property length.

Below is the full logic to check property value length :

image image image
dongma commented 1 year ago

close my issue, the length Text property value should less than 65535.

imbajin commented 1 year ago

close my issue, the length Text property value should less than 65535.

we know the limitation, and consider add an option for user to modify it (so as the length of vertex/edge ID)

LiJie20190102 commented 7 months ago

close my issue, the length Text property value should less than 65535.

we know the limitation, and consider add an option for user to modify it (so as the length of vertex/edge ID)

Excuse me, is there a PR solution to this issue

imbajin commented 7 months ago

close my issue, the length Text property value should less than 65535.

we know the limitation, and consider add an option for user to modify it (so as the length of vertex/edge ID)

Excuse me, is there a PR solution to this issue

thanks for the reminder,address it again

LiJie20190102 commented 7 months ago

I want to implement this function, can it be assigned to me

imbajin commented 7 months ago

I want to implement this function, can it be assigned to me

@LiJie20190102 If u want to config property length, we could reopen this issue & link PR to it If u only want to config VertexID length, u could submit a new issue (better)

dongma commented 7 months ago

close my issue, the length Text property value should less than 65535.

we know the limitation, and consider add an option for user to modify it (so as the length of vertex/edge ID)

Excuse me, is there a PR solution to this issue

@LiJie20190102 I currently don't create PR to the repository to fix this problem, "extract the limitation to a configuration" is the suggesstion from database developers. we filtered the error rows which contains longer field (more than 65535) when import data to hugegraph.

LiJie20190102 commented 7 months ago

I want to implement this function, can it be assigned to me

@LiJie20190102 If u want to config property length, we could reopen this issue & link PR to it If u only want to config VertexID length, u could submit a new issue (better)

I want to do both

LiJie20190102 commented 7 months ago

I think a good solution is not to limit the length of vertex/edge ID or attributes, but to dynamically change the attribute values of org.apache.hugegraph.backend.serializer.BytesBuffer, such as UINT8_ MAX, because in business use, sometimes the length of vertex IDs does need to be this long. In this case, we should not limit their length, but rather modify the limit to accept the corresponding value,WDYT @imbajin @dongma

imbajin commented 7 months ago

I think a good solution is not to limit the length of vertex/edge ID or attributes, but to dynamically change the attribute values of org.apache.hugegraph.backend.serializer.BytesBuffer, such as UINT8_ MAX, because in business use, sometimes the length of vertex IDs does need to be this long. In this case, we should not limit their length, but rather modify the limit to accept the corresponding value,WDYT @imbajin @dongma

var-length occupation generally affects read and write performance, including but not limited to the impact of EID filling occupation and scanning properties. It would be more appropriate to consider allowing users to customize configurations first? (then we could try to test the var-length way)

@dosu-bot What's your suggestion/analysis?

dosubot[bot] commented 7 months ago

🤖

Hey @imbajin, great to see you back with another insightful query! Hope you're doing well.

Thank you for your detailed explanation and suggestion. However, allowing users to customize configurations to dynamically change the attribute values of org.apache.hugegraph.backend.serializer.BytesBuffer, such as UINT8_MAX, could potentially lead to issues.

The constants defined in BytesBuffer class like UINT8_MAX, UINT16_MAX, UINT32_MAX, ID_LEN_MAX, BIG_ID_LEN_MAX, STRING_LEN_MAX, BLOB_LEN_MAX, etc. are used to define the maximum size of different data types and identifiers. These constants are used throughout the codebase to ensure data integrity and consistency.

Changing these values dynamically could lead to unpredictable behavior, data corruption, or even system crashes. For example, if you increase the UINT8_MAX value, it could lead to overflow issues when the data is read back because the system expects the data to be within the original range.

Moreover, these constants are also used to allocate memory for buffers. Dynamically changing these values could lead to inefficient memory usage or even out-of-memory errors if the values are set too high.

Therefore, it's not recommended to allow users to customize these configurations dynamically. If you need to handle larger IDs or attributes, it would be better to consider changing the data model or using a different data type that can accommodate larger values.

If you still want to proceed with this, it would require a significant amount of testing to ensure the system behaves correctly under different configurations. It's also important to provide clear documentation and warnings to users about the potential risks of changing these values.

As for the role of org.apache.hugegraph.backend.serializer.BytesBuffer in the HugeGraph database, it is a utility for reading and writing binary data. It extends the OutputStream class and provides methods for writing and reading various data types to and from a byte buffer. You can find more details about its role in the source code.

I hope this helps. If you have any other questions or need further clarification, feel free to ask.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.