检索对齐文本不太方便

daviddalao commented 2 years ago

这个问题出在哪儿？还有在Parallel Concordancer，是不是不能同时显示多个tmx格式文件的检索结果？

BLKSerene commented 2 years ago

右边 generation settings 里 source file 和 target file 要手动指定不能相同一对多的检索后续版本会考虑跟进

BLKSerene commented 1 year ago

2.3.0 已发布 Parallel Concordancer 支持一对多的平行检索不再需要手动设置源文件和目的文件了

daviddalao commented 1 year ago

谢谢Python大神告知。祝好！

	david

@. | ---- 回复的原邮件 ---- | 发件人 | Ye Lei @.> | | 发送日期 | 2022年9月25日 22:03 | | 收件人 | @.> | | 抄送人 | @.> , @.***> | | 主题 | Re: [BLKSerene/Wordless] 检索对齐文本不太方便 (Issue #22) |

2.3.0 已发布 Parallel Concordancer 支持一对多的平行检索不再需要手动设置源文件和目的文件了

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

daviddalao commented 9 months ago

您好，请问打开文本为什么这么慢？我已经把text拆成小文件了。

	david

@. | ---- 回复的原邮件 ---- | 发件人 | @.> | | 发送日期 | 2022年9月25日 22:09 | | 收件人 | @.**@.> | | 主题 | 回复： [BLKSerene/Wordless] 检索对齐文本不太方便 (Issue #22) | 谢谢Python大神告知。祝好！

	david

@. | ---- 回复的原邮件 ---- | 发件人 | Ye Lei @.> | | 发送日期 | 2022年9月25日 22:03 | | 收件人 | @.> | | 抄送人 | @.> , @.***> | | 主题 | Re: [BLKSerene/Wordless] 检索对齐文本不太方便 (Issue #22) |

2.3.0 已发布 Parallel Concordancer 支持一对多的平行检索不再需要手动设置源文件和目的文件了

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

BLKSerene commented 9 months ago

把文本拆成一个个小文件并不能提高速度因为大文件本来就是每次读取 n 行的（这个值可以在设置面板 - Files - Miscellaneous settings 里改）打开文件时需要进行分段分句分词以方便后续的统计操作速度的问题只能随版本迭代慢慢优化可以试下最新版3.4.0 不过再怎么优化肯定还是主要受到分句器、分词器等NLP包本身的速度限制如果对精度没有高要求例如英语等印欧语系的语种可以在设置面板里把对应语种的分词器改成 NLTK tokenizer 基于规则或正则的速度肯定要比默认的基于神经网络的分词器要快但是精度就要比默认的低一点

BLKSerene / Wordless

检索对齐文本不太方便 #22