Tencent / HunyuanDiT

Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
https://dit.hunyuan.tencent.com/
Other
3.32k stars 285 forks source link

IndexKits How to support English? #104

Closed h3clikejava closed 3 months ago

h3clikejava commented 3 months ago

(mmdet) h3c@ai-1:~/Documents/HunyuanDiT$ python ./hydit/data_loader/csv2arrow.py ./dataset/test/csvfile/image_text.csv ./dataset/test/arrows ./dataset/test/csvfile/image_text.csv ./dataset/test/arrows Traceback (most recent call last): File "/home/timehut/Documents/HunyuanDiT/./hydit/data_loader/csv2arrow.py", line 88, in make_arrow(csv_root, output_arrow_data_path) File "/home/timehut/Documents/HunyuanDiT/./hydit/data_loader/csv2arrow.py", line 41, in make_arrow data = pd.read_csv(csv_root) File "/home/timehut/miniconda3/envs/mmdet/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 948, in read_csv return _read(filepath_or_buffer, kwds) File "/home/timehut/miniconda3/envs/mmdet/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 617, in _read return parser.read(nrows) File "/home/timehut/miniconda3/envs/mmdet/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1748, in read ) = self._engine.read( # type: ignore[attr-defined] File "/home/timehut/miniconda3/envs/mmdet/lib/python3.10/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 234, in read chunks = self._reader.read_low_memory(nrows) File "parsers.pyx", line 843, in pandas._libs.parsers.TextReader.read_low_memory File "parsers.pyx", line 904, in pandas._libs.parsers.TextReader._read_rows File "parsers.pyx", line 879, in pandas._libs.parsers.TextReader._tokenize_rows File "parsers.pyx", line 890, in pandas._libs.parsers.TextReader._check_tokenize_status File "parsers.pyx", line 2058, in pandas._libs.parsers.raise_parser_error pandas.errors.ParserError: Error tokenizing data. C error: Expected 5 fields in line 3, saw 16

image_path,text_zh ./dataset/test/images/0.jpg,pattern, a vibrant, stylized starburst in shades of yellow and orange. The star has elongated points radiating outward, creating a dynamic and bright appearance. ./dataset/test/images/1.jpg,pattern, At the center, there is a transparent vase with a long stem holding two blooming roses, one positioned above the other. The vase is rendered in a light blue sketch style. The roses, in red, are detailed with green leaves. Behind the roses, there is a subtle outline of a face in profile, seemingly looking towards the flowers. To the right of the roses, there is a blue butterfly in flight, adding a touch of whimsy. Additionally, two yellow star-like sparkles are present, one near the vase and one closer to the butterfly. The overall style of the drawing is light, airy, and sketch-like, giving it an elegant and whimsical feel.a delicate and artistic drawing. ./dataset/test/images/2.jpg,pattern, a beautifully artistic drawing of a butterfly with wings that appear to be made of delicate, translucent blue lines. The butterfly's wings are intricately adorned with small red roses and green leaves, adding a touch of natural beauty and elegance. The roses are strategically placed along the edges and center of the wings, creating a harmonious blend of floral and insect elements. The overall style of the drawing is light and airy, with a watercolor-like quality, giving it a whimsical and ethereal feel. The butterfly is depicted in flight, with light blue lines trailing behind it, suggesting movement and grace. .

Perhaps the English comma was treated as a delimiter. How can I resolve this?