Yoctol / strpipe

text preprocessing pipeline
Other
5 stars 0 forks source link

add_step(s or no s)_by_op_name ???? #39

Closed SoluMilken closed 6 years ago

SoluMilken commented 6 years ago

In readme: add_step_by_op_name

import strpipe as sp

p = sp.Pipe()
p.add_step_by_op_name(
    op_name='Trim',
    op_kwargs={'tokens': ['\n', '\r']},
)
p.add_step_by_op_name('CharTokenize')
p.add_step_by_op_name(
    op_name='MapStringToIndex',
    state={'你': 0, '好': 1, '早': 2},  # if provided, the p.fit won't change it
)

data = sp.TextData([
    '你好啊\n',
    '早安',
    '你早上好\n',
])

p.fit(data)
result, tx_info = p.transform(data)  # convention: tx => tranform
back_data = p.inverse_transform(result, tx_info)

In pipe.py https://github.com/Yoctol/strpipe/blob/master/strpipe/pipe.py#L50