Open mittsommer opened 3 months ago
Do you mean space before the code or space in general? Could you provide a concrete example of code block?
Guten Tag, thank you for your replay we are working on output article with code inside in markdown formating, here is an example
这样在当前目录下就能够生成demo的api服务了。 下图为生成的项目目录结构: 在logic下面的demologic.go编写逻辑
func (l *DemoLogic) Demo(req *types.Request) (resp *types.Response, err error) { // todo: add your logic here and delete this line return &types.Response{ Message: "hello world", }, nil }
in this case, all white space before the code line in the code block were removed, which is unexpected and not friendly for LLM training
btw. here is another bug (maybe) when extracting inline code block, a redundant '\n' was added after a inline code block now result
1.2、实现
WebMvcConfigurer
接口,注册拦截器 which is supposed to be
1.2、实现
WebMvcConfigurer
接口,注册拦截器
thank you
Yes, spacing is not necessarily preserved in code blocks, this can be improved.
Hello, thanks for yours continous work on trafilatura recent when we using trafilatura working on code-text content extraction, wo noticed that the santize func remove all white space \ table even in code block when using txt outpput formating we think the problem is here preserve_space=False in default https://github.com/adbar/trafilatura/blob/2c9f20296c1c5ce9a23715a07df5b623f3016b65/trafilatura/xml.py#L315C5-L315C51